AD-A179  7t2  AN  EVALUATION  OF  SPEECH  RECOGNITION  TECHNOLOGV(U) 
BATTELLE  MEMORIAL  INST  RESEARCH  TRIANGLE  PARK  NO 
M  G  JOOST  ET  AL.  (3  OEC  it  DAAG29-R1-D-R1M 


UNCLASSIFIED 


MlfROCOP*  HtSOlUllON  HSt  LHAR' 

•«  «t«i  •  ■'  *“'»  *  '  * 


A^VDWCiwTwT^^CWtLti: 


scv 


AD-A179  762 


DEPARTMENT  OF  INDU8TI 
BOX  7906 

NORTH  CAROLINA  STATE 
RALEIGH,  NORTH  CAROLII 


AN  EVALUATION  0 

tech: 

Michael 
Taryn  S 
Robert 


An  Evaluation  of  Speech  Recognition  Technology 


by 


Michael  G  Joost 
Taryn  S  Moody 
Robert  D  Rodman 

North  Carolina  State  University 


for 


Product  Manager,  Army  Communicative  Systems 


5  December  1986 


Contract  No.  DAAG29 - 8 1 -D- 0100 
Delivery  Order  2439 
Scientific  Services  Program 


The  views,  opinions,  and/or  findings  con¬ 
tained  in  this  report  are  those  of  the 
authors  and  should  not  be  construed  as  an  of¬ 
ficial  Department  of  the  Army  position, 
policy,  or  decision,  unless  so  designated  by 
other  documentation. 


fMMKBKr* 


'a  REPORT  SECURITY  CLASSIFICATION 

Unclassified 


2a  SECURITY  CLASSIFICATION  AUTHORITY 


2b  DECLASSIFICATION  I  DOWNGRADING  SCHEDULE 


4  PERFORMING  ORGANIZATION  REPORT  NUMBER(S) 

Delivery  Order  2439 


oa  NAME  OF  PERFORMING  ORGANIZATION 

Dr.  Michael  G.  Joost 


REPORT  DOCUMENTATION  PAGE 


lb  RESTRICTIVE  MARKINGS 


Form  Aooro veo 
OMB  No  0704-0188 
Jib  Oate  lun  JO  ?986 


5  MONITORING  ORGANIZATION  REPORT  NUMBER(S) 

TCN  86-564 


6b.  OFFICE  SYMBOL  7 a.  NAME  OF  MONITORING  ORGANIZATION 

(/F  applicable) 

U.S.  Army  Research  Office 


b.  ADDRESS  (C/fy,  State,  and  ZIP  Code) 

P.0.  Box  12211 

Research  Trianale  Park,  fIC  27709-2211 


x.  AOOK6SS  \ Cty \  Sts tt,  snd  ZIP  Coat) 

Dept,  of  Industrial  Engineering 
NCSU ,  Box  7906 
Raleigh,  NC  27695 


3a.  NAME  OF  FUNDING  i  SPONSORING  8b.  OFFICE  SYMBOL  9.  PROCUREMENT  INSTRUMENT  IDENTIFICATION  NUMBER 

ORGANIZATION  (If  applicable) 

U.S.  Army  Communicative  System,  AMCPM-AC5 


3c.  AOORESS  (Cry,  State,  and  ZIP  Code) 


10  SOURCE  OF  FUNOING  NUMBERS 


(Charles  H.  Clark,  Jr.)  program 
P.0.  Box  4337  element  no. 

Ft.  Eustis,  VA  23604-0337 


U  TITLE  (Include  Secunty  Classification) 

An  Evaluation  of  Speech  Recognition  Technology 


PROJECT 

NO 


WORK  UNIT 
ACCESSION  NO 


12.  PERSONAL  AUTHOR(S) 

M.G.  Joost,  T.S.  Moody,  R.D.  Rodman 


13a.  ’YPE  OF  REPORT  13b.  TIME  COVERED  14.  DATE  OF  REPORT  (Year,  Month.  Day)  15  PAGE  COUNT 

FINAL  REPORT  from  7  Ju  I  '8b  to  7  Nov'8i  1986  December  5  130 


i6.  supplementary  notation  Task  was  performed  under  a  Scientific  Services  Agreement  issued  by 
Battelle,  Research  Triangle  Park  Office,  200  Park  Drive,  P.0.  Box  12297,  Research  Triangle 
Dark.  NC  27709 


17  COSATI  COOES  18.  SUBJECT  TERMS  ( Continue  on  reverse  if  necessary  and  identify  by  block  number) 

SUB-GROUP  „  ,  .  ^  _ _ _ _ 


Speech  recognition,  recognition  performance, 
effects  of  noise. 


19.  ABSTRACT  ( Continue  on  reverse  if  necessary  and  identify  by  block  numoer) 

The  last  15  years  has  seen  the  development  of  speech  technology  at  a  very  raoid  rate. 
Unfortunately,  the  fact  and  fiction  of  recognition  are  not  always  easily  separated. 

This  confusion  is  not  only  evident  among  users,  but  also  often  amona  system  in¬ 
tegrators  . 

This  paper  outlines  the  technology  today  and  provides  results  from  one  set  of  benchmark 
tests.  Three  tests  were  performed  with  live  speakers  in  three  noise  environments.  The 
tasks  used:  a)  a  sixteen  word  discrete  vocabulary,  b)  a  37  word  connected  speech 
vocabulary,  and  c)  a  30  word  connected  speech  vocabulary  which  was  very  tightly  con¬ 
strained  (syntaxed).  The  noise  environments  included  a  "quiet"  background  noise,  a 
noisy  background  of  loud  voices,  and  the  sounds  associated  with  a  normal  (loud)  vehicle 

repair  shop-  (OVtR! 


:o  distribution/ availability  of  abstrac" 

□  JNCLASSIFIED/UNLIMITED  50  SAME  AS  RPT  □  OTIC  USERS 

2’  A3STRAC*  SECURITY  CLASSIFICATION 

22a  NAME  OF  RESPONSIBLE  NOIVIOUAL 

22b  tEl£RH0NE  [Include  srea  Code) 

■ 

c  OFP  CE  SYMBOL 

DO  FORM  1473. 3a  Mas 


33  APR  eaition  may  o*  us«o  until  exnaustec 
All  otn«r  eaitiont  are  oosoiete 


SECURITY  CLASS!  FlCAr'ON  O*  ~*'S  JAGE 


19.  ABSTRACT  { Continued) 


The  results  indicate  that  virtually  all  systems  tested  could  be  made  to  perform 
well  with  specific,  well-motivated  speakers  and  under  all  noise  conditions.  Condi¬ 
tions  not  requiring  advanced  features  (eg.  large  vocabularies  or  connected  speech) 
may  turn  these  features  into  a  liability  through  increased  error.  In  sDite  of  this, 
however,  the  technology  is  sufficiently  mature  to  support  many  field  applications. 


Table  of  Contents 

I.  Abstract . 1 

II.  Introduction . 2 

A.  Objectives . 2 

III.  Background . . . 3 

IV.  Marketing  Literature  Review . 3 

A .  Technology . . 3 

1.  Manner  of  Speaking . 4 

2.  Speaker  Dependence . 5 

3.  Vocabulary  Capacity . 5 

4.  Training  Support . 6 

5.  Compatibility . 8 

6.  Development  tools . 9 

V.  Performance  Testing . 10 

A.  Vocabulary  Selection . 10 

B.  Scenario . 10 

C.  Environmental  Conditions . 10 

D.  Noise  Characteristics . 11 

1.  Effects  of  Background  Noise  on  ASR's . 11 

2.  Effects  of  Background  Noise  and  Setting  on  Speech... 11 

E.  Input  Apparatus . 13 

F.  Type  of  Speech  Used  in  Research . 14 

1.  Parameter  Setting . 15 

VI.  Method . 15 

A.  Scenarios/Vocabulary . 15 

B.  Equipment . 16 

1.  Recognizers . 16 

2.  Microphones . 17 

3.  Recording  Equipment.. . 17 

C.  Speech  Signals . 17 

D.  Environment . 18 

VII.  Procedure . 19 

A.  Training  Phase . 19 

1.  First  Scenario  -  Discrete  Task . 19 

a .  Votan . 19 

b.  IBM,  Intel,  Interstate  CSRB,  TI . 19 

c.  Interstate  4000  and  Verbex  4000 . 20 

d.  Kurzweil . 21 

e  .  ITT . 21 


» 

L 


2.  Second  Scenario  -  Connected  Speech . 21 

a.  Votan . 21 

b.  TI . 22 

c.  Verbex  4000  and  Interstate  4000 . 22 

d.  ITT . V . 22 

3.  Third  Scenario  -  Connected  Speech . 22 

B.  Testing  Phase . 23 

1.  First  Scenario  -  Discrete  Speech . 23 

2.  Second  and  Third  Scenarios  -  Connected  Speech . 23 

VIII.  Results . 23 

A.  First  Scenario . 23 

B.  Scenario  Two  -  Connected  Speech . 31 

C.  Scenario  Three  -  Connected  Speech . 36 

IX.  Discussion  . 37 

A.  General . 37 

B.  Discrete  Speech . 38 

C .  Connected  Speech  . 39 

X.  Summary  and  Conclusions . 43 

XI.  Glossary . 45 

XII.  References . . . . . 47 

Appendices 

I.  Vendor  List . 49 

II.  Discreet  Speech  Vocabulary . 77 

III.  Connected  Speech  Vocabularies . 78 

IV.  ITT  Syntax  Cor  Scenario  2 . 79 

V.  Other  Connected  Syntax  (Verbex.  Interstate,  and  TI) . 80 

VI.  Test  Sentences . 81 

VII.  Rejection  /  Misrecognition  Matricies:  Discrete  Task . 83 

VIII.  Tukey  Analysis  of  Means:  Discrete  Task . 85 

IX.  Tukey  Analysis  of  Noise  Effects:  Discrete  Task . 86 

X.  Tukey  Analysis  of  Recognizer  *  Noise:  Discrete  Task . 87 

XI.  Confusion  and  Error  Matricies:  Discrete  Task . 88 

XII.  Tukey  Analysis  of  Speaker  Effects:  Scenario  2 . 91 

XIII.  Tukey  Analysis  of  Speaker  *  Recognizer:  Scenario  2 . 92 

XIV.  Tukey  Analysis  of  Recognizer  *  Noise:  Scenario  2 . 100 

XV.  Re jection/Misrecognition  Matricies:  Scenario  2 . 106 

XVI.  Misrecognition  Error  Trees:  Scenario  2 . 108 

XVII.  Performance  Means:  Scenario  2 . 113 

XVIII.  Tukey  Analyses:  Scenario  3 . 118 

XIX.  Re jection/Misrecognition  Matricies:  Scenario  3 . 119 

XX.  Misrecognition  Error  Trees:  Scenario  3 . 123 


xi 


P 


List  of  Figures 

1.  Correct  Recognition  by  Noise  (160  maximum) . 25 

2.  Rejections  by  Noise . 26 

3.  Misrecognitions  by  Noise . 26 

4.  Correct  Recognition  by  Speaker . 27 

5.  Rejection  by  Speaker . 27 

6.  Misrecognition  by  Speaker . 28 


List  of  Tables 

1.  Summary  of  Vendor  Literature . 6 

2.  Microphone/Recognizer  Combinations  Tested . 18 

3.  Number  of  Utterances  Required  for  Training  Recognizers ....  20 

4.  Analysis  of  Variance:  Discrete  Data . 24 

5.  Differences  Between  Recognizers . 25 

6.  Differences  Between  Recognizers:  Industrial  Noise . 29 

7.  Differences  Between  Recognizers:  Fast  Food  Restaurant . 30 

8.  GLM  Results:  Scenario  2 . 32 

9.  Differences  Between  Recognizers . 32 

L0.  Differences  Between  Recognizers:  Speaker  1 . 33 

.  1.  Differences  Between  Recognizers:  Speaker  2 . 33 

12 .  Differences  Between  Recognizers:  Speaker  3 . 34 

.3.  Differences  Between  Recognizers:  Speaker  4 . 34 

l4.  Differences  Between  Recognizers:  No  Noise . 35 

.5.  Differences  Between  Recognizers:  Industrial  Noise . 35 

L6.  Differences  Between  Recognizers:  Fast  Food  Noise . 36 

.7.  Analysis  of  Variance:  Scenario  3 . 36 


Acknowledgements 


The  authors  wish  to  thank  IBM,  Intel,  Interstate  Electronics, 
ITT,  Kurzweil  AI,  Texas  Instruments,  Verbex,  and  Votan  for  their 
participation  which  made  this  study  possible.  Appreciation  is 
also  due  to  Charles  Clark  and  the  Product  Manager,  Army  Com¬ 
municative  Systems  for  initiating  and  supporting  this  task. 


I .  Abstract 


J  The  last  15  years  has  seen  the  development  of  speech  tech¬ 
nology  at  a  very  rapid  rate.  Unfortunately,  the  fact  and  fiction 
of  recognition  are  not  always  easily  separated.  This  confusion 
is  not  only  evident  among  users,  but  also  often  among  system  in¬ 
tegrators  . 

This  paper  outlines  the  technology  today  and  provides 
results  from  one  set  of  benchmark  tests.  Three  tests  were  per¬ 
formed  with  live  speakers  in  three  noise  environments.  The  tasks 
used:  a)  a  sixteen  word  discrete  vocabulary,  b)  a  37  word  con¬ 
nected  speech  vocabulary,  and  c)  a  30  word  connected  speech 
vocabulary  which  was  very  tightly  constrained  (syntaxed).  The 
noise  environments  included  a  "quiet"  background  noise,  a  noisy 
background  of  loud  voices,  and  the  sounds  associated  with  a  nor¬ 
mal  (loud)  vehicle  repair  shop. 

The  results  indicate  that  virtually  all  systems  tested  could 
be  made  to  perform  well  with  specific,  well-motivated  speakers 
and  under  all  noise  conditions.  Conditions  not  requiring  ad¬ 
vanced  features  (eg.  large  vocabularies  or  connected  speech)  may 
turn  these  features  into  a  liability  through  increased  error.  In 
spite  of  this,  however,  the  technology  is  sufficiently  mature  to 
support  many  field  applications. 


llWWWWWWWlWl'T'WTOWWI^^IW»'WWIW>IWWWWWMM|.PtilW»P»>in»WW«JI»UHI.'WWtiH 


II.  Introduction 


This  report  is  the  outgrowth  of  questions  regarding  the 
state  of  the  art  in  speech  recognition  both  capabilities  ar?d 
limitations.  Based  on  the  claims  of  vendors,  it  appears  that 
substantial  progress  has  been  made  recently  in  the  effectiveness 
of  speech  recognition  systems.  There  are,  however,  disappoint¬ 
ingly  few  large-scale  applications  from  which  data  may  be  drawn 
for  comparison  of  the  various  systems.  The  shortage  of  informa¬ 
tion  is  especially  evident  when  comparable  data  on  noise, 
speaker,  or  recognizer  effects  are  needed.  This  study  attempts 
to  provide  some  baseline  data  which  would  be  useful  in  the 
evaluation  process.  Three  distinct  scenarios  are  addressed  and 
extrapolation  beyond  these  limits  must  be  pursued  with  great 
care.  The  term  "performance  evaluation"  has  been  used,  though  it 
is  necessary  to  recognize  that  there  are  currently  no  performance 
standards  for  speech  recognition  systems.  As  a  result,  there  are 
no  "standard"  tasks,  and  a  certain  amount  of  quibbling  over  the 
appropriateness  of  a  given  test,  task,  or  scenario  is  inevitable. 


A.  Objectives 


Given  the  above  constraints,  the  primary  objectives  of  this  study 
were  to: 

a)  Perform  a  survey  of  the  vendor  literature, 

b)  Assess  the  performance  of  as  many  systems  as  practical, 

and 

c)  Provide  first-pass  data  on  differential  system  perfor¬ 

mance. 

The  recognition  systems  evaluated  in  this  study  were:  the 
Interstate  Vocalink  S4000;  the  ITT  Multibus  CSR;  the  Verbex 
Series  4000;  the  TI  Speech  Development  System;  the  Votan  VPC 
2100;  the  IBM  Voice  Communication  Adapter;  the  Intel  iSBC  570; 
the  Interstate  CSRB;  and  the  Kurzweil  Voice  System. 

The  vendor  literature  was  surveyed  for  two  purposes.  It 

provided  an  assessment  of  the  vendor  perception  of  the  state  of 
the  speech  recognition  art.  Additionally,  it  served  to  identify 
the  vendors  who  currently  have  product  offerings.  As  with  any 
survey,  there  may  be  some  product  which  ha3  inadvertently  been 
omitted,  but  every  effort  has  been  made  to  solicit  information 
from  every  known  potential  vendor. 

Several  performance  assessments  were  mandated.  For  com¬ 

parison  with  other  studies,  testing  in  a  quiet  environment  was 
necessary.  To  be  of  more  practical  value,  however,  it  was  essen¬ 
tial  to  test  performance  in  two  additional  noise  environments. 
The  first  emulated  an  industrial  environment  and  was  taped  in  an 


automotive  service  shop,  while  the  second  was  somewhat  more  in¬ 
nocuous,  consisting  of  vocal  noise  at  a  fast  food  restaurant  or¬ 
der  counter.  This  last  environment  approximates  a  noisy  class¬ 
room  or  other  area  with  verbal  interference. 


Finally,  many  of  the  existing  studies  of  recognizer  system 
performance  use  taped  speech  under  the  rationale  that  each  system 
receives  identical  input.  Taped  speech  and  its  means  of  entry 
into  the  recognition  system  differ  in  several  marked  respects 
from  live  speech  delivered  orally  into  a  microphone.  Live  speech 
was  chosen  for  this  study  to  preserve  the  more  realistic  perfor¬ 
mance  environment,  and  within  speaker  variation  was  dealt  with 
statistically. 


III.  Background 


Almost  since  the  advent  of  the  first  commercial  speech 
recognizers  in  the  early  1970's,  manufacturers  of  automatic 
speech  recognizers  (ASR's)  have  been  claiming  high  performance 
for  their  systems  that  is  often  not  achieved  in  actual  applica¬ 
tions.  The  net  result,  therefore,  is  a  perhaps  healthy,  skep¬ 
ticism  of  manufacturers'  claims.  For  this  reason,  concerns  have 
arisen  about  how  to  best  evaluate  a  system  for  a  specific  ap¬ 
plication  and  a  given  group  of  users.  The  system  evaluation  of¬ 
ten  takes  two  forms;  systems  can  be  evaluated  based  on  a  review 
of  the  literature  available  from  manufacturers  or  other  users,  or 
they  can  be  evaluated  through  tests  of  the  system  performance. 
The  former  provides  essential  design  information  (eg.  vocabulary 
size,  language/application  support,  or  price),  whereas  the  latter 
is  required  to  characterize  the  ASR's  behavior  under  actual 
operating  conditions  (eg.  recognition  accuracy,  speed,  trainir _ 
time,  application  development  time). 


IV.  Marketing  Literature  Review 


In  reviewing  the  marketing  material  provided  by  manufac¬ 
turers,  information  was  extracted  addressing  several  major 
areas.  These  include  the  technology,  vocabulary  capacity,  train¬ 
ing  support,  hardware  and  software  compatibility,  and  development 
tools . 


A.  Technology 


The  issue  of  technology  includes  two  dimensions:  manner  of 

speaking  and  speaker  dependence.  Although  all  vendors  identify 


!W 


the  segment  to  which  they  belong,  no  universally  accepted  defini¬ 
tions  for  these  terms  exist.  For  this  reason,  a  certain  amount 
of  confusion  results.  For  clarity,  definitions  are  presented 
which  closely  follow  those  proposed  by  Pallett  (1985). 


1 .  Manner  of  Speaking 


The  cadence  of  speech  allowed  (in  some  cases  enforced  may  be 
a  better  term)  by  the  technology  can  be  broken  into  three 
classes . 

Discrete  Speech  forces  the  speaker  to  aid  the  recognizer  by 
pausing  between  each  utterance.  This  results  in  somewhat  stilted 
speech,  may  be  perceived  as  being  slow,  and  appears  hard  for  some 
speakers  to  learn.  In  spite  of  this,  discrete  utterance  recogni¬ 
tion  is  the  most  common  implementation  today  and  is  quite 
adequate  for  many  speech  input  tasks . 

Connected  Speech,  on  the  other  hand,  requires  that  the  word 
be  spoken  carefully,  but  does  not  require  that  an  explicit  pause 
be  used  to  separate  each  utterance.  Although  this  appears  easier 
for  speakers  to  use,  it  is  achieved  through  higher  processing 
requirements  and  thus,  is  usually  somewhat  more  expensive. 

Continuous  Speech  is  most  like  natural  speech.  Words  are 
spoken  fluently  and  rapidly  as  in  conversational  speech.  When 
this  occurs,  however,  speech  sounds  are  influenced  by  neighboring 
sounds  (coarticulation). 

Evaluating  vendor  products  can  become  somewhat  confusing  at 
this  point,  since  many  vendors  do  not  make  the  distinction  be¬ 
tween  connected  and  continuous  speech.  Additionally,  there  is 
nothing  inherently  "better"  about  a  system  simply  because  it  al¬ 
lows  or  promotes  the  use  of  one  type  of  speech.  It  is  the  ap¬ 
plication  which  usually  dictates  the  recognition  requirements. 
Thus,  no  one  system  is  likely  to  prove  more  suitable  than  others 
for  all  applications  and  it  is  likely  to  be  a  mistake  to  attempt 
to  identify  one  system  that  is  to  be  the  standard  for  all  future 
applications.  In  general,  unless  the  application  really  requires 
connected  speech  recognition  capabilities,  selection  of  a  dis¬ 
crete  speech  recognizer  is  desirable,  because  of  the  additional 
cues  provided  by  the  speaker  (ie.  pauses)  to  the  discrete  sys¬ 
tem,  which  usually  make  it  more  tolerant  of  environmental  noise. 


*1 


2.  Speaker  Dependence 


Speaker  Dependent  recognition  relies  on  matching  speech 
samples  to  previous  utterances  of  the  same  speaker.  An  enroll¬ 
ment  or  training  procedure  is  followed  to  allow  the  system  to  ex¬ 
tract  adequate  models  of  the  individual's  speech  patterns. 

Speaker  Independent  recognition,  however  requires  no  enroll¬ 
ment  for  recognition.  Rather  than  using  speaker  specific  models 
for  recognition,  general  models  appropriate  for  a  large  popula¬ 
tion  are  used.  Most  existing  systems  with  speaker  independent 
capabilities  have  relatively  small  vocabulary  sizes  (eg.  digits 
plus  several  control  words),  and  tend  to  have  somewhat  lower 
recognition  accuracy  than  is  usually  attained  by  comparable 
speaker  dependent  systems. 

The  vendor  products  tabulated  In  Table  1,  are  more  com¬ 
pletely  described  in  Appendix  I.  In  several  instances,  a  vendor 
has  claimed  continuous  capabilities  but  may  be  shown  as  connected 
to  preserve  the  above  definitions.  In  rare  cases,  insufficient 
information  was  available  to  make  this  assessment  so  a  question 
mark  was  inserted  to  identify  the  uncertainties. 

3.  Vocabulary  Capacity 

The  question  that  is  probably  most  often  asked  relates  to 
the  size  of  the  recognition  vocabulary.  What  most  individuals 
tend  to  forget  is  that  at  any  instant  in  time,  unless  the  user  is 
attempting  verbal  dictation,  the  probability  is  very  low  that 
more  than  a  relative  handful  of  words  are  feasible  in  the  exist¬ 
ing  context.  Additionally,  there  is  usually  a  trade-off  that 
must  be  made;  as  the  candidate  vocabulary  gets  larger,  the  prob¬ 
ability  of  recognition  error  increases.  What  in  many  cases  is  a 
more  pertinent  question  is  how  well  the  system  supports  subdivid¬ 
ing  the  vocabulary.  As  is  evident  from  Table  1,  there  are  a  wide 
variety  of  vocabulary  sizes  supported  by  the  various  systems. 
This  reflects  several  major  design  philosophies  -  provide  several 
relatively  small  vocabularies  which  may  be  switched  very  rapidly, 
a  larger  vocabulary  that  can  be  arbitrarily  split  under  program 
control,  or  a  large  vocabulary  that  relies  heavily  on  the  ac¬ 
curacy  of  the  recognition  algorithm.  A3  a  point  of  reference, 
there  are  very  few  wel 1 - structured  applications  requiring  more 
than  200-300  words  in  the  vocabulary  if  the  application  is 
thoroughly  studied,  understood,  and  designed. 


Table  1.  Summary  of  Vendor  Literature. 


VENDOR 

PRODUCT 

RECOGNITION  TECHNOLOGY 

V0CAB. 

VOICE 

PRICE 

Dep. 

Ind. 

Disc. 

Conn 

Cent 

SIZE 

OUTPUT 

(A  US) 

AT&T 

Conversant  1 

X 

X 

X 

X 

2561 

■> 

quote 

AUDEC 

SSB-1000 

X 

X 

144 

250 

Calltalk 

DVIO  Hod.  100 

X 

X 

7 

500 

1 

quote 

Dragon  Systeas 

Voicescribe  1000  X 

X 

1000 

995 

Dragon  Systeas 

Voiceacribe  20000 

X 

X 

20000 

quote 

IBM 

Voice  Coaa.  Adpt.X 

X 

5*64* 

X 

1,700* 

Intel 

iSBC  570 

X 

X 

200 

2,900 

Interstate  V  P 

CSRB 

X 

X 

digits 

240 

opt. 

1,410 

Interstate  V  P 

SRB-LC 

X 

X 

400 

395 

Interstate  V  P 

Vocalink  S4000 

X 

X 

100 

5,200 

ITT  DCD 

Multibus  CSC 

X 

X 

X 

7 

300 

X 

37,000 

Kurzveil  A  I 

tvs 

X 

X 

1000 

6,500 

Hicrophonics 

(various) 

X 

X 

X 

128 

quote 

SEC  Aneiica 

(various) 

X 

X 

X 

X 

<500* 

ouote 

Scott  Instr. 

Coretechs  VET  3 

X 

X 

X 

X 

200 

X 

9,995 

Speech  Systeas 

Phonetic  Engine 

X 

X 

? 

5000 

j 

quote 

Texas  Instr. 

TI -Speech 

X 

X 

X 

20*50* 

X 

1,155 

Toshiba 

TOSVuICE 

X 

X 

64 

•> 

quote 

Voice  Indust. 

Verbex  4000 

X 

X 

100 

5,500 

Voice  Cntrl  Sys 

VCS  Technology 

X 

X 

20 

1 

quote 

VOTAN 

VSP  1010 

X 

liaited 

X 

X 

7 

64* 

X 

quote 

VOTAN 

VPC  2100 

X 

liaited 

X 

X 

/ 

80** 

X 

quote 

Vestinghouse 

Series  100  VDCS 

X 

X 

200 

■> 

quote 

ICOB 

Seraphine 

X 

X 

X 

X 

100 

\ 

3,000 

1  13  in  speaker  independent  node. 

2  Bundled  price,  nay  also  be  purchased  unbundled. 

2  Less  than  20  in  independent  aode. 

4  Total  vocabulary  aust  be  divided  into  subsets  of  vhich  only  one  aay  be  active  at  any  tiae. 

3  This  aay  be  increased  with  fever  training  passes  or  optional  expansion  vocabulary. 


4.  Training  Support 


The  type  of  training  required  depends,  to  a  large  degree,  on 
the  type  of  recognizer.  While  discrete  word  recognizers  require 
only  individual  template(s)  of  each  word,  connected  systems  must 
also  be  able  to  account  for  coarticulation.  Coarticulation  is 
the  phenomenon  observed  at  the  boundary  of  words  spoken  together. 
Each  word  is  influenced  by  the  word  preceding  it  and  is  in¬ 
fluenced,  in  turn,  by  the  succeeding  word.  Thus,  a  connected 
recognizer  relies  not  only  on  templates  for  each  word,  but  also 
requires  models  of  how  coarticulation  affects  each  word-pair.  In 
a  very  simplistic  manner,  every  possible  word-pair  boundary  must 
be  modeled.  Needless  to  say,  the  combinations  quickly  get  very 
large  as  the  vocabulay  size  grows  making  the  enrollment  process 
very  cumbersome  unless  the  possible  combinations  are  efficiently 
pared  down.  As  examples,  the  Interstate  S4000  and  Verbex  4000 
generate  a  relatively  exhaustive  script  for  coarticulation  es¬ 
timation,  while  ITT  relies  on  a  training  script  developed  by  the 
application  designer.  VOTAN  uses  only  the  discrete  utterance 


c 


i 


& 


templates  (relying  on  a  strong  algorithm)  and  allows  operator 
selected  embedded  training  of  particularly  troublesome  combina¬ 
tions.  Scott  Instruments  does  no  coarticulation  evaluation  in¬ 
stead,  the  VET  3  internally  adjusts  word  boundaries  to  allow  con¬ 
nected  recognition. 


a 


$ 


w 


i 


§ 

§ 


S’ 


& 


All  systems  (except  speaker  independent  systems  which,  by 
definition,  require  no  training  but  may  require  speech  patterns 
for  adaptation  purposes)  provide  utilities  for  training  the 
vocabulary.  In  most  cases  this  is  an  off-line  function  that  ac¬ 
quires  and  maintains  patterns.  For  most  applications,  this  is 
sufficient.  These  static  models,  however,  may  not  be  adequate 
when  speech  patterns  are  likely  to  change  due  to  stress,  boredom, 
or  latigue.  Under  these  conditions,  dynamic  updating  of  user 
templates  may  be  required  to  cope  with  dynamic  change,  permanent 
or  transient.  This  dynamic  update  feature  is  available  from 
very  few  vendors  at  the  current  time. 

Whether  due  to  adaptation  or  standard  training  techniques, 
speech  recognition  systems  usually  use  multiple  utterances 
against  which  new  speech  signals  are  compared.  To  guard  against 
inadvertent  contamination  of  the  speech  patterns,  major  dif¬ 
ferences  between  patterns  usually  result  in  a  user  query  thus 
avoiding  the  inclusion  of  coughs,  burps,  and  etc.  The  method 
used  to  represent  these  composite  patterns  varies  greatly.  Most 
systems  use  an  "averaging*  technique  where  template  updates  are 
combined  with  and  replace  previously  existing  templates.  One 
potential  danger  of  this  system  is  that,  as  more  samples  are  in¬ 
cluded  in  the  template,  it  may  become  more  general  and,  over 
time,  no  longer  represent  the  intended  utterance  very  well.  This 
would  tend  to  result  in  an  increased  number  of  errors.  An  alter¬ 
native  approach  used  by  VOTAN  and  Kurzweil  consumes  a  vocabulary 
entry  for  each  update  of  a  word.  This  technique  reduces  the 
chance  that  the  template  becomes  so  general  that  recognition  is 
adversely  affected  at  the  expense  of  reducing  the  maximum  number 
of  words  in  the  vocabulary.  For  example,  if  the  recognizer  had  a 
20  word  vocabulary  limit,  a  single  update  (after  initial 
training)  will  reduce  the  usable  vocabulary  to  10,  two  updates  to 
7  (if  one  word  had  only  a  single  update),  etc.  The  effects  of 
this  may  be  minimized  by  understating  the  available  vocabulary  so 
that  updates  do  not  affect  the  apparent  vocabulary  size. 

Finally,  to  achieve  consistent  performance,  the  user  needs 
feedback,  especially  during  training.  This  feedback  helps  the 
user  develop  the  necessary  speech  habits  and  allows  rapid  deter¬ 
mination  of  the  effects  of  mispronuciation . 


£ 


7 


5.  Compatibility 


Both  hardware  and  software  compatibility  issues  arise  with 
speech  recognition.  At  the  hardware  level,  a  number  of  factors 
need  to  be  considered.  Probably  the  most  flexible  systems  employ 
a  stand-alone  architecture  communicating  with  the  host  via  an  RS- 
232  (typ.)  interface.  Examples  include  the  Verbex  4000  and  the 
ITT  CSR.  Unfortunately,  the  application  development  libraries 
for  many  of  these  systems  assume  a  specific  host  (ie.  the  support 
software  will  run  only  on  a  specific  operating  system) .  When 
that  constraint  is  considered,  then  the  workstations  developed 
around  the  Intel  iSBC  570  or  Westinghouse  systems  may  be  con¬ 
sidered  just  as  flexible. 

The  next  level  of  compatibility  currently  revolves  around 
computer  bus  standards  (typically  Multibus  or  PC  bus).  Within 
this  level  are  two  subdivisions.  One,  the  "low-priced" 
(typically  under  $500  and  designed  to  be  used  in  a  PC)  systems, 
often  use  the  host  CPU  to  execute  the  recognition  algorithm.  In 
this  case,  the  speech  board  is  primarily  a  "front-end" 
amplif ier/f ilter .  While  this  reduces  the  cost  of  the  recognition 
subsystem,  it  usually  severely  curtails  other  computing  func¬ 
tions.  The  result  is  often  substantially  slower  processing  (in 
some  cases,  the  application  software  must  be  custom-designed  to 
use  speech  input.  This  custom  software  is  obviously  a  very  ex¬ 
pensive  solution  and  may  substantially  exceed  the  savings  an¬ 
ticipated  through  the  use  of  the  low-cost  recognition  hardware 
resulting  in  a  higher  net  cost  in  all  but  the  most  trivial  ap¬ 
plications  . 

The  other  subdivision  (typically  $1000  and  up  with  a  PC  or 
Multibus  formfactor)  uses  the  host  processor  as  a  file  server, 
providing  mass  storage  and  supporting  the  application  software. 
Systems  in  this  class  usually  include  integrated  signal  processor 
chips  as  well  as  powerful  microprocessors.  Because  of  this,  the 
host  impact  is  minimal  and  host  applications  can  be  integrated 
with  voice  without  the  need  to  redevelop  the  application  support 
software . 

To  assess  software  compatibility,  several  questions  must  be 
addressed.  Perhaps  the  most  obvious  is  the  operating  system  sup¬ 
ported.  The  largest  number  of  systems  require  MS/PC  DCS.  These 
include  those  which  reside  in  the  PC  (eg.  IBM,  TI,  V0TAN,  and  In¬ 
terstate  CSRB )  as  well  as  those  which  communicate  via  an  RS-232 
link  but  have  DOS-resident  support  software  (eg.  Verbex  4000,  In¬ 
terstate  4000,  and  Kurzweil).  The  latter  set  may  be  used  with 
non-DOS  systems,  but  the  necessary  support  software  is  likely 
non  existent.  Other  operating  systems  which  provide  the  neces¬ 
sary  support  include  UNIX  or  its  derivatives  (supporting  ITT's 
CSR  and  Intel's  iSBC  570)  and  Intel's  iRMX  (supporting  the  Intel 
iSBC  570) . 


Within  the  appropriate  operating  system,  the  level  of  sup¬ 
port  also  differs  dramatically,  ranging  from  a  set  of  subroutines 
to  transaction  generators.  The  use  of  support  subroutines  as¬ 
sumes  that  the  necessary  application  host  languages  are  avail¬ 
able.  In  addition,  the  use  of  these  routines  must  be  initiated 
from  within  the  application  program  which  requires  program 
modification.  At  the  second  level,  the  speech  system  interacts 
with  the  application  through  operating  system  calls.  This  also 
requires  access  to  be  initiated  from  within  the  application 
program  and,  thus,  may  require  substantial  programming.  Finally, 
some  vendors  supply  a  utility  which  may  be  generically  called  a 
transparent  keyboard.  After  being  appropriately  designed,  this 
software  utility  parallels  the  operation  of  the  terminal  keyboard 
and  allows  speech  recognition  to  be  used  without  modification  to 
the  application  software. 

At  the  most  sophisticated  end  of  the  software  support 
spectrum  are  the  transaction  generators.  These  packages  (most 
notably  available  from  Intel  and  Westinghouse )  generate  the 
necessary  software  automatically  after  acquiring  the  interaction 
rules  from  the  application  developer.  As  a  result,  although 
these  may  be  initially  more  expensive,  application  development 
speed  may  make  the  net  system  cost  more  competitive. 

In  general,  there  are  limitations  inherent  in  all  the 
software  support  packages  provided  by  the  vendors.  This  may,  to 
a  large  extent,  be  due  to  the  youth  of  the  technology  with  very 
few  established  application  niches.  As  these  applications  are 
reproduced,  commonalties  are  likely  to  emerge  which  will  likely 
encourage  the  development  of  more  generic  application  generators. 
These  generators  in  turn,  will  promote  the  spread  of  the  technol¬ 
ogy  to  other  applications . 


6.  Development  tools 


Although  the  software  mentioned  above  may  allow  the  integra¬ 
tion  of  voice  into  an  application,  there  is  another  aspect  to  the 
support  of  speech  systems.  The  Intel  and  Westinghouse  packages 
encourage  full  and  careful  use  of  the  voice  channel  implementing 
dialogue  structures,  editing  support,  vocabulary  selection,  and 
syntaxing.  This  improves  the  probability  that  applications  will 
be  properly  designed  and  implemented  by  forcing  the  application 
developer  to  consider  all  aspects  of  the  user  dialog. 

High-level  support  for  the  Kurzweil,  TI,  Verbex  4000,  Inter¬ 
state  S4000,  and  ITT  CSR  also  exists.  This  support  software, 
while  being  well  designed,  is  used  primarily  to  support  advanced 
recognition  features  (eg.  syntaxing  or  training)  and  the  broader 
issues  of  a  comprehensive  verbal  dialog  are  not  addressed. 


9 


* 

L 


I 


V.  Performance  Testing 

In  the  design  of  performance  tests  for  ASR's,a  number  of 
issues  must  be  addressed:  1)  Selection  of  words  to  be  tested;  2) 
Identification  of  test  scenarios;  3)  Environmental  conditions;  4) 
The  type  of  speech  used  for  input  (live  vs.  recorded);  5) 
Parameter  settings;  and  6)  Evaluation  procedures. 


A.  Vocabulary  Selection 

The  selection  of  words  to  be  tested  is  typically  made  based 
on  one  or  more  of  the  following  three  factors.  First,  the  words 
selected  may  form  a  phonetically  balanced  word  list  so  that  all 
phonemes  represented  in  the  language  are  in  some  way  tested.  The 
words  can  also  be  selected  based  on  the  frequency  with  which  they 
are  typically  used  in  voice  input  applications  (e.g.  the  so- 
called  TI  word  list  suggested  by  Doddington  &  Schalk  (1981)). 
Finally,  the  words  selected  for  testing  may  be  chosen  with  an  ap¬ 
plication  in  mind,  in  which  case  the  words  that  will  be  used  in 
the  application  will  provide  the  best  indication  of  performance. 


B.  Scenario 

5 

,  Since  an  application  does  not  consist  of  a  random  sequence 

■  of  words,  scenarios  must  be  designed  to  implement  transactions 

that  exercise  the  vocabulary  in  such  a  way  as  to  be  repre¬ 
sentative  of  typical  use.  Depending  on  the  application,  the 
,  transactions  will  be  of  varying  lengths  and  degrees  of  dif¬ 

ficulty.  Transactions  used  for  testing  must  either  be  repre¬ 
sentative  of  an  actual  application,  or  general  enough  for  generic 
KJ  testing.  Consideration  must  also  be  given  as  to  whether  to  use 

w  "syntaxing",  and  to  what  degree. 


0 


C.  Environmental  Conditions 


If  it  is  intended  that  the  results  be  extended  to  a  specific 
task,  the  environmental  conditions  in  which  the  recognizer  is  to 
be  tested  should  replicate  the  application  environment  as  closely 
as  possible.  These  environmental  conditions  should  include:  the 
noise  characteristics  of  the  application  area;  the  acoustic 
properties  of  the  room  ir.  which  the  application  is  located;  and 
the  type  of  speech  input  apparatus  that  is  required  for  the  ap¬ 
plication  . 

10 


Jft 

L 


D.  Noise  Characteristics 


1 


Automatic  Speech  Recognizer  (ASR)  performance  accuracy  is 
influenced  to  some  extent  by  background  noise  levels  (Rollins  et 
al.,  1983).  The  ambient  noise  dB  level,  the  sound  frequency  dis¬ 
tribution,  and  the  variability  of  the  noise  are  characteristics 
that  may  degrade  ASR  performance.  The  reason  for  this  potential 
performance  degradation  is  related  to  the  influence  of  the  back¬ 
ground  noise  on  the  ASR,  the  speaker,  and  the  microphone. 


1.  Effects  of  Background  Noise  on  ASR's 


g 


y. 


I 


* 


3 

8 

« 


When  background  noise  levels  become  too  high  (eg.  85  dB(A)>, 
the  signal-to-noise  ratio  may  not  be  large  enough  for  the  ASR  to 
detect  which,  if  any,  word  has  been  spoken  (Rollins  et  al., 
1983).  Depending  on  the  spectral  characteristics  of  the  noise, 
this  may  be  experienced  with  or  without  a  noise  canceling 
microphone.  In  this  case,  a  rejection  error  is  most  likely  to 
occur  (Rollins  et  al . ,  1983),  though  this  is  dependent  on  the 
discrimination  levels  set  on  the  ASR.  Rejection  and  misrecogni- 
tion  errors  are  also  likely  to  occur  in  inconsistent  noise  (sound 
pressure  levels  varying  more  than  5  dB(A)),  where  the  front  end 
gain  function  of  the  ASR  does  not  accurately  reflect  the  back¬ 
ground  noise.  This  problem  is  most  significant  when  the  ASR  only 
calibrates  the  front  end  gain  once  and  for  a  brief  period  of 
time . 


Certain  frequency  characteristics  of  noise  can  also  affect 
the  ASR  performance,  especially  high  frequency  components  (  > 
10,000  Hz),  outside  of  the  normal  speech  range.  Microphones 
and/or  ASR's  typically  attenuate  background  noise  differently. 
Noise  canceling  microphones  are  more  effective  below  2,000  Hz, 
and  provide  little  filtering  above  this  frequency  (Larson  et  al. 
1986).  Thus,  high  frequency  noises  can  severely  degrade  ASR  per¬ 
formance,  and  in  some  cases  prohibit  the  use  of  these  recog¬ 
nizers.  This  degradation  often  occurs  because  the  high  frequency 
noise  is  not  removed  from  the  speech  signal,  and  impedes  the 
identification  of  speech  onset  or  termination. 


2.  Effects  of  Background  Noise  and  Setting  on  Speech 


It 

altered 
Lane  et 
noise  - 


has  been  well  documented  that  speech  characteristics  are 
by  high  background  noise  levels  (Pisoni  et  al.  1985; 
al.,  1971;  Draegert,  1951).  The  3ource  of  the  background 
whether  it  is  from  machinery  or  from  ot he”  speakers  in  a 


1 1 


VOW1 


T 


HTOWiwwfifwTOfTOfwmiTtwvm 


A 


5? 

/ 


•<- 


A 

« 

k>  * 


i 


i 


/ 

V 


V. 


room  -  is  also  of  significance  (Webster  et  al.,  1962),  as  are  the 
room  characteristics,  the  speaker  task  (reading  aloud  vs. 
talking),  the  frequency  components  of  the  background  (masking) 
noise,  and  the  use  of  hearing  protection  devices. 

Characteristic  changes  occur  in  speech  that  is  produced  when 
there  is  a  masking  noise  present.  These  effects  have  been 
demonstrated  with  background  noises  as  low  as  50  dB(A)  (Lane  & 
Tranel,  1971).  The  changes  noted  in  the  speech  include  an  in¬ 
crease  in  vocal  intensity  (voice  level  increases  by  half  the  in¬ 
crease  in  background  noise);  increases  in  fundamental  frequency; 
and  an  increase  in  syllabic  duration  and  consequent  decrease  in 
rate  of  speech  (Lane  &  Tranel,  1971).  When  the  masking  noise  is 
produced  by  other  speakers,  the  rate  of  speaking  has  been  found 
to  increase  and  not  decrease  (Webster  et  al.,  1962).  A  tilt  in 
the  short  term  spectrum  of  consonants  and  vowels  ha3  also  been 
observed  (Pisoni  et  al.,  1985). 

The  size  and  reverberation  characteristics  of  the  room  have 
also  been  shown  to  alter  speech  characteristics .  Black  (1950), 
found  that  speech  rate  was  slower  in  large  rooms  (1900  cu .  ft.), 
as  compared  to  small  rooms  (150  cu.  ft.),  and  that  speech  was 
slowest  for  large  live  rooms  (reverberation  time  =  .2-. 3  sec.)  as 
compared  to  dead  rooms  (reverberation  time  =  .8-1.0  sec.).  It 
was  also  found  that  the  intensity  of  the  speech  was  greater  in 
dead  rooms  as  compared  to  live  rooms,  especially  in  the  larger 
room  . 


Garber  et  al.  (1976),  demonstrated  that  noi3e  of  equal  in¬ 
tensity  differentially  affected  the  voice  level  dependent  upon 
the  noises  ability  to  interfere  with  (mask)  the  speech  signal. 
Noise  with  a  range  of  20-20,000  Hz  produced  a  significantly 
higher  vocal  intensity  when  compared  to  noise  ranges  of  1800-2500 
Hz;  4000-6000  Hz;  and  20-250  Hz.  Vocal  intensities  produced  from 
masking  noise  with  a  frequency  range  of  1800-2500  Hz  was  also 
noted  to  be  significantly  higher  than  those  produced  from  a  mask¬ 
ing  noi.se  with  a  frequency  range  of  4000-6000  Hz.  When  noise  of 
equal  loudness  were  presented,  a  similar  differentiation  oc¬ 
curred.  Vocal  intensities  produced  in  masking  noise  m  the  20- 
20,000  Hz  and  350-700  Hz  ranges  were  significantly  higher  than 
those  m  the  1300-2500  Hz  and  4000-6000  Hz  ranges.  The  vocal  in¬ 
tensity  noted  for  the  masking  noise  in  the  1800-2500  Hz  range  was 
also  significantly  higher  than  that  found  in  the  4000-6000  Hz 
range.  In  general,  Garber  et  al.  found  that  the  more  the 
frequency  components  of  the  n o i 3 e  mask  the  speech  signal,  the 
greater  the  change  in  vocal  intensity  produced  by  the  speaker. 


V  Howell  and  Martin  (1975)  have  demonstrated  that  speech  is 

degraded  by  speakers  wearing  hearing  protection  devices.  The 
hearing  protector  affects  the  speaker's  ability  to  hear  his ( her ' 
Dwn  voice  (occlusion  effect'.  The  hearing  protection  device 

y  "attenuates  the  airborne  energy,  hut  has  little  effect 

'  7 


o  n 


-  h.  o 


L 


8 

I 


* 

*  k 


bone  conduction  portion,  except  in  the  lower  frequencies  were  the 
perceived  voice  levels  are  actually  amplified  as  a  result  of  the 
occlusion  effect"  (Berger  et.al. ,  19B6,  p.368.).  This  results  in 
the  speaker  perceiving  his  own  voice  as  being  louder  than  it  ac¬ 
tually  is  as  compared  to  the  background  noise  level  and  a  sub¬ 
sequent  reduction  in  voice  level  of  2  -  4  dB  by  the  speaker 
(Kryter,  1946;  Howell  &  Martin,  1975). 

Additional  research  has  indicated  that  the  above  effects  may 
be  altered  unsystematically  by  factors  such  as  speaker  training 
(instructions);  speaker  task;  hearing  loss,  and  sidetone  effects 
(Lane  &  Tranel,  1971;  Siegel  <5  Pick,  1974;  Borden,  1979). 


E.  Input  Apparatus 

w 


The  microphone- ASR  system  combination  must  be  chosen  to 
satisfy  several  performance  and  operational  requirements  in  order 
to  facilitate  ASR  performance  in  the  application  setting.  The 
type  of  microphone,  performance  characteristics ,  reliability, 
durability,  ease  of  use,  and  comfort  are  important  criteria  that 
need  to  be  considered  (Waller,  1985). 

Microphones  presently  in  use  with  ASRs  include:  headset 
microphones;  handheld  microphones;  gooseneck  microphones;  wire¬ 
less  microphones  (typically  headset);  and  telephone  systems.  The 
headset  microphones  can  be  either  one-way  (no  verbal  feedback)  or 
two-way  (can  be  used  for  both  speaking  and  hearing).  The  headset 
microphones  also  can  provide  a  full  range  of  hearing  protection. 
In  part,  physical  constraints  of  the  application  dictate  the  type 
of  microphone  selected.  However,  performance  characteristics  are 
equally  important  in  microphone  selection. 

The  microphone  chosen  for  a  particular  speech  recognition 
application  must  satisfy  two  performance  characteristics.  The 
microphone  must  perform  accurately  and  reliably  in  the  specific 
ASR  application.  Humidity,  temperature,  noise  levels,  physical 
workspace  layout,  and  the  nature  of  the  ASR  task  all  influence 
the  microphone  performance  and  subsequent  recognition  rates. 
Technical  constraints  of  the  microphone  must  also  be  considered. 
Frequency  response,  range,  directionality,  and  stability  (ability 
to  tolerate  head  movement  without  changing  the  microphone  posi¬ 
tion  relative  to  the  mouth)  are  necessary  to  assure  reliable  and 
accurate  input  to  the  ASR.  To  date,  the  headmount  microphones 
most  consistently  satisfy  the  necessary  raqu i rements  for  speech 
recognition  (Piice,  1983). 

Headmount  microphones,  although  the  best  microphone  at  the 
present  *-ime  for  speech  recognition,  have  seme  negative  at¬ 
tributes  hindering  their  usefulness  in  ASR  applications.  The 
microphone  can  be  uncomfortable,  move  out  of  position  and  may  not 

13 


Si 

L. 


cancel  ambient  noise  sufficiently  for  successful  recognition. 

The  specifications  of  the  particular  microphone  model  must 
be  considered  in  attempting  to  match  the  application  needs  of  the 
task  with  the  appropriate  microphone.  The  microphones  must 
restrict  extraneous  noise  sounds  from  entering  into  the 
microphone  while  enhancing  the  entrance  of  human  speech  sounds. 
Therefore  the  attenuation  characteristics  of  the  microphone  must 
be  matched  with  the  frequency  characteristics  of  the  application 
noise . 


The  application  environment  (temperature,  humidity,  du3t) 
also  must  be  considered  with  respect  to  microphone  durability. 
Excessively  high  (above  55-60  C.  [130-140  degrees  F.l)  or  exces¬ 
sively  low  temperatures  (below  -25  C.  [-13  degrees  F.l)  air  pres¬ 
sure  changes,  and  high  humidity  levels  have  been  found  to  alter 
microphone  performance  (Peterson,  1980).  The  use  of  microphones 
in  work  situations  may  subject  the  microphone  to  damaging  bumps, 
jolts,  or  vibrations.  The  microphone  chosen  for  use  in  ASR  ap¬ 
plications  must  be  able  to  withstand  these  environmental  charac¬ 
teristics  . 

An  additional  microphone  characteristic  is  its  physical 
stability.  The  microphone  must  be  able  to  be  maintained  in  the 
same  relative  position  to  the  mouth  throughout  all  applications. 
The  movement  of  the  microphone  piece  severely  degrades  the  per¬ 
formance  of  the  ASR. 


F.  Type  of  Speech  Used  in  Research 


When  testing  speech  recognizers,  an  important  question  is 
whether  to  use  live  speakers  or  recorded/digitized  data.  There 
are  advantages  and  disadvantages  in  either  of  these  methods  of 
speech  input. 

Proponents  of  using  recorded/digitized  speech  support  the 
need  for  both  a  standardized  testing  procedure  as  well  as  a 
standardized  data  base  of  speech  input.  They  suggest  that  this 
method  is  the  only  fair  means  of  comparison  between  speakers  due 
to  the  variability  that  exists  in  a  speaker's  utterances  of  the 
3ame  word.  Several  tests  of  speech  recognizers  have  Deen  com¬ 
pleted  using  this  type  of  data  base  (Doddington  &  Schalk,1981; 
Baker,  1982;  and  Nusbaum  et  al.,  1986). 

Proponents  of  the  use  of  live  speakers  suggest  that  this  is 
the  most  effective  way  to  accurately  compare  systems  as  they 
might  be  used  in  an  actual  application  setting.  The  inter  and 
intra -speaker  variability  is  a  naturally  occurring  phenomenon 
that  should  be  accounted  for,  not  controlled.  This  method  also 
provides  the  speaker  with  an  opportunity  to  "tune  his  voice"  to 


4 


the  specific  ASR  being  tested.  This  typically  occurs  through  the 
feedback  that  the  machine  generates  in  both  the  training  and 
testing  procedures.  This  adaptation  to  the  system  is  often  seen 
and  its  effects  on  system  performance  are  important.  Finally, 
the  entry  of  recorded  data  into  the  ASR  differs  considerably  from 
the  entry  of  live  data.  When  played  back,  taped  data  must  either 
go  directly  into  the  ASR,  hence  by-passing  the  microphone,  or 
played  out  through  a  speaker  which  produces  a  speech  signal  dif¬ 
ferent  in  many  essential  respects  from  orally  produced  speech. 

The  disadvantages  of  live  speech  is  that  its  replication 
requires  a  large  number  of  speakers  and  it  requires  more  time 
than  alternative  approaches .  However,  this  type  of  evaluation  is 
also  more  likely  to  provide  a  more  realistic  view  of  the  system's 
actual  performance  in  an  application  setting. 


1 .  Parameter  Setting 


Many,  though  not  all  recognizers  allow  the  user  to  set 
various  parameters.  These  parameters  typically  are  used  to  set 
the  minimum  match  score  (ie.  how  well  the  current  utterance 
matches  the  'be3t'  template)  and  match  score  difference  (a3  the 
minimum  difference  increases,  the  probability  decreases  that  the 
'runner-up'  word  is  the  correct  match).  Based  on  these 
parameters,  the  decisions  are  made  to  report  an  utterance  as 
"recognized"  or  "rejected".  If  testing  is  performed  with  a 
specific  application  in  mind,  these  parameters  may  be  adjusted  to 
suit  that  application.  For  generic  testing,  either  several  com¬ 
binations  of  these  parameters  may  be  tried,  which  increases  the 
testing  effort,  or  a  "forced  choice"  philosophy  may  be  adopted  so 
that  the  system's  ability  to  discriminate  among  similar  sounding 
words  is  most  conservatively  tested. 


VI .  Method 


A.  Scenarios/Vocabulary 

Three  distinct  speech  scenarios  were  used  for  this  study. 
The  first  scenario  used  discrete  speech  and  a  16  word  vocabulary 
(Appendix  II).  The  second  and  third  scenarios  used  connected 
speech  with  37  and  30  vocabulary  words  respectively  (Appendix 
III). 

The  second  scenario  was  designed  to  measure  recognition  ac¬ 
curacy  for  connected  speech  using  limited  syntaxing.  The  sen¬ 
tences  constructed  for  this  scenario  were  designed  to  be  ap¬ 
plication  specific.  Twenty  four  sentences  were  used  for  this 

15 


N  A  A  A 


grew 

i , 


mmmvmi  wiwwwipjw.’wti  >v  rp  vupmfj  pjuj^j^ 


4.  - 


S< 
\  * 


!  S 


*  % 


!  * 


a 

v. 


VN 


.  w 
> 


task ; 
long , 


twel 

and 


scenario 
recognizer 
the  37  un 
for  recog 
restricted 
once  the  f 
ing  used 
4000,  and 
second  wo 
an  optiona 
word  choic. 
not  limit 
except  tha 
list  (Appe 


ve  sentences  were  three  words  long,  six  were  four  words 
six  were  five  words  long.  The  syntaxing  used  for  the 
varied  among  the  recognizers  tested.  The  Votan  speech 
had  no  syntaxing  capabilities  (except  via  subsets),  so 
ique  vocabulary  words  used  were  available  at  all  times 
nition.  The  syntax  used  for  the  ITT  recognizer  was 
;  the  number  of  possible  word  choices  were  limited 
irst  word  was  recognized  (Appendix  IV).  The  syntax- 
on  the  remaining  three  systems  ( Verbex,  Interstate 
TI),  consisted  of  a  first  word  choice  (15  words),  a 
rd  choice  (15  words),  a  third  word  choice  (15  words), 
1  fourth  word  choice  (11  words),  and  an  optional  fifth 
e  (6  words).  The  recognition  of  the  first  word  did 
the  possible  choices  for  the  subsequent  utterances, 
t  they  could  only  be  chosen  from  the  appropriate  word 
ndix  V ) . 


The  third  scenario  was  design 
for  digits  using  a  connected  speech  t 
Five  basic  sentences  were  used  for  th 
in  number  length  (one  to  five  digits) 
sentences  (Appendix  VI).  The  syntaxi 
tested  in  this  scenario  was  equivalen 
choice  was  recognized,  the  second  wor 
to  one  choice)  and  all  further  utte 
the  number  string  spoken.  The  s 
3 true ted  from  the  digits  zero  to  nine 
to  five  digits. 


ed  to  test  recognition  rates 
ask  and  restricted  syntax, 
is  scenario,  but  they  varied 
.  This  resulted  in  25  test 
ng  used  for  the  recognizers 
t  since  once  the  first  word 
d  choice  was  known  (limited 
ranees  were  known  except  for 
poken  number  string  was  con- 
and  contained  sets  of  one 


B.  Equipment 


1 .  Recognizers 


rr* 

i. 

h 

e  In 

ter 

S  t 

ate 

Voca 

1 

ink 

b 

U  .1* 

C 

a 

R 

(  ITT)  , 

a 

nd 

Verbex 

Se 

w 

er 

e 

u 

s 

ed  f 

or 

a  1 

1  t 

hr  ee 

t 

ask 

T 

he 

<T» 

-L 

I 

Spee 

ch 

De 

vel 

opmen 

t 

sy 

r 

ec 

og 

n 

i 

zers 

we 

re 

U3 

ed  in 

all 

r 

ec 

og 

n 

i 

t  ion 

ta 

sk 

. 

Addi 

t 

ion 

a 

3 

*.  r> 

o 

t 

e 

s  ted 

in 

the 

discr 

e 

te 

m 

un 

1C 

a 

t 

ion 

Ad 

apter 

(IBM 

) 

/ 

s 

ta 

te 

C 

3  R  B ; 

a 

nd 

th 

e  Kur 

z 

wei 

( 

Ku 

r  z 

w 

e 

il )  . 

A1 

1  of 

th 

e 

ASR 

s  tes 

t 

ed 

a 

bl 

e 

e 

X 

cept 

f  o 

r 

the 

ITT 

s 

yst 

b 

o 

a 

P 

r 

0 

4-> 

0 

ype 

r 

ese 

arch 

3 

yst 

S4000  (Interstate  4000),  ITT  Multi- 
ries  4000  (Verbex  4000)  recognizers 
j  included  in  this  research  project, 
stem  (TI)  and  Votan  VPC  2100  (Votan) 
but  the  second  connected  speech 
illy,  the  following  recognizers  were 
recognition  task:  an  IBM  Voice  Com¬ 


an 


Intel 

Voice 


iSBC  570 
Systems 


( I ntel ) ; 
speech 


an  Inter- 
recognizer 


16 


2.  Microphones 


The  following  microphones  were  used  during  this  study:  an  AT 
9100  headset  microphone,  a  Kurzweil  headset  microphone;  a 
Prologue  handheld  microphone;  a  T1  handheld  microphone;  a  Shure 
SM10A  headset  microphone;  and  a  Shure  VR230  headset  microphone. 
The  microphones  used  in  this  study  were,  in  most  cases,  supplied 
by  the  manufacturer  of  the  speech  recognizer  being  tested.  When 
the  manufacturer  did  not  supply  a  microphone,  a  Shure  microphone 
from  the  NCSU  laboratory  was  used.  The  microphone-speech  recog¬ 
nizer  combinations  used  in  this  study  are  listed  in  Table  2. 

3.  Recording  Equipment 

This  study  used  a  JVC  Model  CR  6060U  videccassette  re¬ 
corder,  a  Vector  Research  VR  220A  amplifier,  two  Acoustic  Re¬ 
search  AR-5  speakers,  and  two  3/4  inch  3M  Professional  VHS 
videocassettes  for  the  noise  conditions  tested.  A  GenRad  1565-B 
sound  level  meter  was  also  used  for  initial  calibration  of  the 
noise  and  for  the  sound  pressure  level  readings. 

C.  Speech  Signals 

The  voices  of  six  speakers  were 
speakers,  four  male  and  two  female, 
familiarity  with  the  use  of  speech 
known  hearing  disorders,  were  native 
ranged  in  age  from  24  to  38  years. 

For  each  scenario,  the  recognizers  were  tested  m  the  room 
described  below.  The  first  test  was  completed  under  background 
level  noise  conditions.  The  second  and  third  tests  were  com¬ 
pleted  with  masking  noise  being  played  through  two  speakers 
which  were  approximately  two  feet  away  from  the  speaker  and 
recognizer.  The  order  of  the  presentation  of  the  masking  noise 
was  randomized  and  the  noise  consisted  of  either  an  industrial 
noise  condition  or  a  fast  food  restaurant,  noise  condition. 


used  for  this  study.  The 

had  varying  degrees  of 
recognizers.  They  had  no 
speakers  of  English,  and 


L_ 


£ 


Table  2.  Microphone/Recognizer  Combinations  Tested. 


Microphone 

$ 

Recognizer  AT9100 

Xurzseil  Prologue  TI 

snioA 

VR230 

?. 

A, 

IBM  X 

Intel 

X 

/ 

-> 

Interstate  4000 

X 

Interstate  CSRB 

X 

in 

X 

f) 

Xurzeeil 

X 

! 

TI 

X 

f  ■ 

Verbex 

X 

* 

Votan 

X 

£ 


D.  Environment 


v 


,v 

V 


> 


C' 


All  recognizers  except  the  ITT  system,  were  tested  in  a 
large  classroom  with  high  ceilings  (11  ft.).  The  background 
level  of  noise  in  the  room  was  measured  at  45  dB(A>  and  61  dB(C) 
(the  A  and  C  weightings  are  different  descriptions  of  the  noise 
characteristics) .  The  ITT  system  was  tested  in  a  smaller  office 
area,  and  the  background  noise  level  was  not  measured,  but  was 
not  noticably  different. 

The  sound  pressure  level  measurements  taken  during  the 
playing  of  the  industrial  noise  tape  indicated  an  L« < =  79  dB(A). 
The  mean  sound  pressure  level  for  the  C - weigh ted  readings  was  82 
dB(C).  The  range  of  sound  pressure  levels  wa3  from  62  dB(A)  to 
84  dB ( A )  and  from  73  dB(C)  to  36  dB(C).  The  standard  deviation 
of  the  A-weighted  sound  pressure  level  was  5.38. 


The  sound  pressure  level  measurements  taken  during  the 
playing  of  the  fast  food  restaurant  noise  tape  was  less  variable, 
with  a  standard  deviation  for  A-weighted  sound  pressure  levels 
of  1.36,  and  L. <  =  80  dB(A).  The  mean  C-weighted  sound  pressure 
level  was  34  dB(C).  The  range  of  the  sound  pressure  levels  was 
from  78  dB ( A )  to  83  dB ( A )  and  from  33  dB(C)  to  87  dB ( C ) . 


18 


V 


The  speech  recognizer  and  the  speaker  were  Doth  in  the  free 
field  area  of  the  room  with  respect  to  the  speakers,  so  rever¬ 
berant  characteristics  of  the  room  were  not  included  in  any 
data  . 


VII.  Procedure 


The  procedure  used  for  all  scenarios  consisted  of  a  training 
phase  and  a  testing  phase.  Since  the  training  phase  varied  be¬ 
tween  the  different  types  of  scenarios,  noise  conditions,  and 
for  the  various  recognizers,  the  procedures  used  in  this  study 
are  described  according  to  these  three  factors. 


A.  Training  Phase 


First  Scenario  -  Discrete  Task 


Two  male  and  two  female  speakers  were  trained  in  the  dis¬ 
crete  speech  task.  Three  sets  of  templates  were  made  for  each 
speaker  and  for  each  recognizer  tested.  The  templates  were  made 
according  to  the  manufacturer's  recommendations  for  the  number  of 
utterances  of  vocabulary  words  required  for  accurate  recognition 
(Table  3).  The  training  procedures  used  for  all  speakers  have 
thus  been  grouped  according  to  similar  types  of  suggested 
manufacturer's  procedures. 


a .  Votan 


The  training  procedure  for  the  Votan  consisted  of  three  ut¬ 
terances  of  each  vocabulary  word.  Each  vocabulary  word  was 
spoken  once,  and  then  this  process  was  repeated  two  times. 
Training  for  the  two  noise  conditions  consisted  of  this  same 
process,  and  all  utterances  were  made  in  the  noise  condition 
being  tested.  The  Votan  system  stores  all  templates,  so  there 
was  no  feedback  to  the  user  relating  to  the  closeness  of  the 
templates,  and  no  training  utterances  were  rejected  by  the  sys¬ 
tem  . 


b.  IBM,  Intel,  Interstate  CSRB,  TI 


The  four  systems  in  this  category  are  similar  m  that  they 
all  provide  some  feedback  to  the  user  relating  the  similarity 
between  the  initial  utterance  of  the  word  and  the  subsequent 
training  utterances  (updated  templates).  The  recognizer,  a  * 
times,  rejected  an  updated  utterance  because  it  did  not  match  tne 
initial  template  formed.  Therefore,  the  number  ot  utternr.--,- 
1 is ted  ire  the  minimal  number  of  utterances  of  a  word  icsuminj 


© 

I 

L 


i»iiTO’i«n^jpjwMw^wiawiuMUMUuuwMn>Jwiwrjwwwwwwwiiiwv>iiFpiiBuP.WUut|IWiWUI|W 


L 


y, 


i  * 


* 


,  A 


that  all  utterances  were  accepted  (though  this  was  not  always  the 
case).  The  Interstate  CSRB  required  three  utterances  of  each 
vocabulary  word  while  the  Intel  and  IBM  used  four  utterances. 
The  words  were  spoken  sequentially;  a  complete  pass  was  made 
through  the  vocabulary  before  subsequent  words  were  spoken.  The 
TI  speech  recognizer  required  five  tokens  of  each  vocabulary 
word.  Two  utterances  of  each  vocabulary  word  were  completed 
sequentially  and  then,  three  additional  utterances  of  each  word 
were  completed.  Training  for  the  two  noise  conditions  consisted 
of  the  same  process  as  described  above  with  all  utterances 
spoken  in  the  noise  condition,  with  the  exception  or  the  TI 
which  required  two  initial  utterances  to  be  made  in  no  noise 
with  the  three  updated  utterances  being  spoken  with  the  noise 
present . 


Table  3.  Number  of  Utterances  Required  for  Training  Recognizers 


Voice 

Profile 


t  utterances  t  utterances 
background  noise  aasking  noise 


Recognizer 

S: 

IBM 

4 

4 

Intel 

4 

4 

Interstate  4000 

9 

5 

4 

Interstate  CSRB 

3 

3 

in 

10  ain. 

3 

0 

Kurzweil 

1  hour 

3 

3 

TI 

5 

2 

3 

Verbex 

9 

5 

4 

Votan 

3 

3 

Silence  Poise 


c.  Interstate  4000  and  Verbex  4000 


The  training  for  both  of  these 
Training  was  controlled  by  the 
proximately  nine  utterances  of  each 
was  not  provided  with  feedback 
spoken  in  comparison  to  the  template 
user  had  the  option  of  rejecting 
word  was  not  spoken  correctly, 
vocabulary  words  to  be  spoken  w 
Training  of  templates  required  that 
pleted  in  which  each  word  was  uttere 
"quiet".  A  second  trial  was  then  com 


systems  was  ident 

recognizer  and  required 
vocabulary  word.  The 
on  the  accuracy  of  the 
of  the  word.  However, 
utterances  if  it  was  fel 
The  presentation  of 
as  randomized  by  the  sy 
an  initial  training  be 
d  approximately  5  time 
pleted  in  which  the  user 


ical  . 
ap- 
user 
word 
the 
t  the 
the 
stem  . 

com  - 
s  in 
ut- 


20 


tered  the  vocabulary  words  four  additional  times  under  the  noise 
conditions  in  which  the  recognizer  would  be  tested. 

d.  Kurzweil 

The  Kurzweil  recognizer  required  that  an  enrollment  process 
be  completed  prior  to  the  actual  training  of  the  vocabulary  words 
used  in  this  scenario.  The  enrollment  process  forms  a  "voice 
profile"  which  the  system  requires  for  each  user.  The  enrollment 
process  took  approximately  one  hour  and  is  system  controlled. 
The  training  process  of  the  word3  used  in  this  study  required 
three  utterances  of  each  vocabulary  word  which  was  presented 
serially.  The  training  process  wa3  repeated  for  each  of  the 

noise  conditions  with  all  utterances  spoken  with  the  appropriate 
noise  background . 

e.  ITT 

The  ITT  recognizer  also  required  a  "voice  profile",  though 
this  process  required  approximately  ten  minutes.  The  actual 
training  process  consisted  of  three  utterances  of  each 
vocabulary  word.  The  templates  were  not  remade  for  each  noise 
condition.  They  were  made  for  the  background  noise  condition 
only.  However,  "silence  templates"  were  recalibrated  (adapted) 
for  the  two  noise  conditions. 


2.  Second  Scenario  -  Connected  Speech 

Two  male  and  two  female  speakers  were  used  for  this  task. 
Each  speaker  attempted  to  make  three  sets  of  templates  for  each 
recognizer.  However,  templates  could  not  be  made  for  two 

speakers  using  the  TI  recognizer  in  any  noise  condition 
(apparently  due  to  the  excessive  memory  required  to  store  the 
speech  templates  for  their  slow  speech),  one  speaker  was  unable 
to  use  the  TI  in  the  fast  food  restaurant  noise  condition  (also 
apparently  due  to  insufficient  memory),  and  one  speaker  was  un¬ 
able  to  use  the  Interstate  4000  in  the  industrial  noise  condi¬ 
tion  (apparently  due  to  the  interaction  between  his  voice  and  the 
background  noise).  The  templates  were  made  m  accordance  with 
the  manufacturers'  specified  procedures  and,  thus,  the  training 
procedure  varied  between  the  recognizers. 

a.  Votan 

The  Votan  recoqnizer  required  three  utterances  for  each  word 
in  the  vocabulary.  The  procedure  used  was  the  same  as  described 
under  the  discrete  task  training  procedures.  The  Votan  does 
permit  extraction  of  connected  speech  templates,  however,  when 
this  was  attempted,  the  system  ran  out  of  memory  spa'-e  for 
templates . 


b.  TI 


The  TI  recognizer  required  the  speaker  to  say  a  sentence  and 
then  to  3ay  isolated  words  from  the  sentence.  The  sentences  were 
defined  and  developed  by  the  recognizer.  After  all  words  had 
been  spoken  using  this  process,  the  words  were  updated  three 
times  using  the  system  defined  sentences  for  updating.  This 
process  was  repeated  for  each  of  the  two  noise  conditions,  except 
that  the  initial  training  was  done  in  quiet  with  the  three  up¬ 
dated  passes  being  done  with  the  appropriate  noise  being 
presen  t . 

c.  Verbex  4000  and  Interstate  4000 

The  training  required  for  both  of  these  recognizers  was, 
again,  identical.  The  first  phase  consisted  of  the  speaker 
saying  each  word  in  the  vocabulary  using  discrete  speech.  This 
was  followed  by  approximately  four  utterances  of  each  word  being 
spoken  using  connected  speech  with  sentence  or  sentence  frag¬ 
ments  as  the  prompt.  The  updating  phase  consisted  of  each 
speaker  making  approximately  four  utterances  of  each  word  using 
connected  speech  as  prompted  with  sentence  or  sentence  frag¬ 
ments.  This  process  was  repeated  for  each  of  the  noise  condi¬ 
tions  with  the  initial  phase  being  completed  in  silence  and  the 
update  phase  being  completed  with  the  masking  noise  present. 

d.  ITT 

The  training  for  the  ITT  system  was  controlled  by  the  ITT 
representative  present  during  the  training  and  testing  of  thi3 
system.  Training  was  continued  until  the  templates  had  been  fine 
tuned  to  the  representative's  specification.  Thus,  training 
varied  greatly  between  the  speakers.  The  ITT  recognizer  did  not 
require  retraining  for  the  two  masking  noise  conditions,  only  the 
"3ilence  templates"  were  updated. 


3.  Third  Scenario  -  Connected  Speech 


Two  male  and  one  female  speakers  were  used  for  this 
scenario.  Three  sets  of  templates  were  made  for  each  speaker  for 
each  recognizer  tested.  The  three  recognizers  tested  in  this 
scenario.  Interstate  4000,  ITT,  Verbex  4000,  (the  only  systems 
which  supported  adequate  syntaxing  for  this  scenario)  were 
trained  using  the  same  procedures  as  described  for  the  second 
scenar i o . 


22 


I 


c 

0 

a  ; 


'.V 


B.  Testing  Phase 

1.  First  Scenario  -  Discrete  Speech 


Each  speaker  repeated  the  16  words  in  the  vocabulary  ten 
times  in  random  order.  The  words  were  recorded  as  being  cor¬ 
rectly  recognized;  not  recognized  (rejected);  or  misrecognized . 
For  each  misrecognition,  the  misrecognized  word  was  recorded. 
This  process  was  repeated  for  each  of  the  noise  conditions  for 
each  recognizer.  A  random  order  was  used  to  test  the  recognizers 
as  well  as  the  effects  of  noise  to  minimize  any  order  effects. 
The  background  noise  condition  (no  noise)  was  presented  first  in 
all  cases. 


f: 


v. 

y 


K- 


& 


v. 


v 

V 


5? 


2.  Second  and  Third  Scenarios  -  Connected  Speech 


Each  of  the  sentences  used  in  these  scenarios  was  repeated 
four  times  in  sequential  order.  This  process  was  repeated  for 
each  noise  condition  and  for  each  recognizer  tested.  The  sen¬ 
tences  were  recorded  as  being  recognized  correctly;  rejected  (no 
sentence  or  sentence  fragment  recognized);  or  misrecognized,  for 
further  analysis. 


VIII.  Results 


The  results  from  the  three  speech  scenarios  used  to  assess 
the  performance  of  the  speech  recognizers  are  presented 
separately.  Due  to  the  distinct  nature  of  these  scenarios,  the 
results  cannot  be  directly  compared.  Additionally,  since  the 
systems  were  tested  using  their  default  parameter  settings,  some 
exhibited  forced  recognition  (the  recognizer  returned  a  match  for 
all  utterances,  a  substitution  error  was  preferred  over  a  rejec¬ 
tion  error)  while  other  systems  forced  minimum  separation  (a 
rejection  was  preferred  over  a  substitution). 


A.  First  Scenario 


An  Analysis  of  Variance  procedure  (Sheffe,  1959;  Searle, 
1971)  wa3  used  to  initially  evaluate  the  data.  The  dependent 
variable  was  the  number  of  words  correctly  recognized,  and  the 
independent  variables  consisted  of  the  following:  recognizer, 
noise  conditions,  speakers,  recognizer  by  noise  condition,  and 
recognizer  by  speaker.  This  model  accounted  for  90  X  of  the  to¬ 
tal  variance. 


23 


unsvuvmviuiuunR^iimi^fvinuii'vm.fwmsufnuiiKUvmviwiinwpivii 


(M 


miMniimifi 


* 


As  noted  in  Table  4,  there  were  significant  differences  ex¬ 
hibited  due  to  differences  in  the  recognizer  used,  the  type  of 
noise  in  the  environment,  and  the  speaker  providing  the  signal. 
Additionally,  the  interaction  between  the  recognizer  and  speaker 
or  noise  was  also  significant. 

Table  4.  Analysis  of  Variance:  Discrete  Data 


Source 

df 

SS 

F 

P 

Recognizer 

e 

31640.30 

37.68 

<  .0001 

Noise 

2 

5218.35 

24.70 

<  .0001 

Speaker 

3 

1975.51 

6.23 

<  .0011 

Recognizer  *  Noise 

16 

8135.65 

4.81 

<  .0001 

Recognizer  *  Speaker 

24 

5543.41 

2.19 

<  .0088 

Key: 


df  Degrees  of  Freedoa 

SS  Sua  of  squares  of  deviations 

F  Coaputed  F  ratio 

p  Probability  that  the  observed  F  ratio  is  due  to  chance  * 

Significance  is  arbitrarily  defined  at  p  <=  .05 


These  effects  were  further  analyzed  using  a  Tukey 's  Studen- 
tized  Range  Test  (Tukey,  1952;  Dunnett,  1980).  Table  5  presents 
a  matrix  of  the  significant  differences  (p<=.05>  that  were  ob¬ 
tained  between  the  recognizers  based  on  the  number  of  correct 
recognitions.  For  example,  the  Votan  performed  significantly  bet¬ 
ter  than  the  Verbex  and  Interstate  4000  systems.  The  mean  recog¬ 
nition  rates  obtained  for  the  individual  recognizers  and  the 
Tukey  Analysis  can  be  found  in  Appendix  VII  and  Appendix  VIII, 
respectively . 

A  Tukey  test  for  the  noise  effect  demonstrated  that  the  mean 
correct  recognition  rate  for  the  no  noise  condition  was  sig¬ 
nificantly  higher  than  the  mean  recognition  rates  for  both  the 
industrial  noise  condition  and  the  fast  food  restaurant  noise 
condition  (Figures  1,  2,  and  3  with  details  in  Appendix  IX). 
Analysis  of  the  speaker  effect  indicated  significantly  higher 
correct  recognition  rates  for  speaker  3  as  compared  to  speakers  2 
and  4.  The  results  also  demonstrated  significantly  higher  recog¬ 
nition  rates  for  speaker  1  as  compared  to  speaker  4  (Figures  4, 
5 ,  and  6  )  . 


24 


Table  5.  Differences  Between  Recognizers 


CTukey  Test  (alpha  =  .05)] 

(♦  =  rot  significantly  better  than  coluen 
-  -  coluen  significantly  better  than  roe) 


Recognizer  1  1 

1 

2  ! 

3  1 

4  1 

S  1 

6  1 

7  1 

8  1 

9 

1 

1 

1 

1 

-  | 

- 

2 

1 

1 

1 

-  | 

-  I 

*  I 

- 

3 

|  ♦ 

I 

♦  | 

1 

1 

1 

1 

I 

1 

♦ 

4 

|  ♦ 

1 

♦  | 

1 

1 

1 

I 

1 

1 

♦ 

5 

|  ♦ 

1 

♦  | 

1 

I 

1 

1 

1 

1 

6 

|  ♦ 

1 

♦  | 

1 

1 

1 

1 

1 

1 

♦ 

7 

J  ♦ 

1 

♦  | 

1 

1 

1 

1 

1 

1 

8 

|  ♦ 

1 

♦  { 

1 

I 

1 

1 

1 

1 

♦ 

9 

|  ♦ 

1 

♦  | 

1 

1 

Key: 

1  -  Verbex 

2  s  Interstate  4000 

3  =  Votan 

4  =  TI 

5  -  IBB 

6  *  Intel 

7  =  Interstate  CSRB 

8  =  in 

9  s  Kurzseil 


200  r 


•ut  Food 


tocasmzr 

ES  l"dwtn»l  in  Now 


Figure  1.  Correct  Recognition  by  Noise  (160  maximum) 


d 


14000  lilt  ICSRB  Intel  ill  IK  II  ferbi  fctir 


: 

t 

d 


1400.?  UK  OB  Intel  III  (IS  I!  krfii  Itotan 


Rtsogri i«r 

C-P''l  W  Spire  E35p>i  EJiplM 


Figure  6.  Misrecognition  by  Speaker 


A  summary  of  the  interaction  effect  of  recognizer  by  noise 
as  analyzed  by  the  Tukey  test  are  shown  in  Tables  6  and  7,  with 
the  plus  and  minus  3igns  having  the  same  meaning  as  in  the  pre¬ 
vious  table.  This  summary  is  limited  to  differences  in  which  the 
noise  condition  was  held  constant.  There  are  no  results  given 
for  the  no  noise  condition  as  there  were  no  significant  dif¬ 
ferences  in  the  correct  word  recognition  rates  between  recog¬ 
nizers  for  the  no  noise  condition.  The  complete  results  of  this 
analysis  are  located  in  Appendix  X. 


m  TVTWvr.-y.  y.  yjy  yyj  pjwurwy  M  MV  «  ■  W1WP  MPjefwje  jay  e  JP y«  |  l|IUi  WIH.U1  W 


wnwnviiivt* 


x. 

L 


Table  6.  Differences  Between  Recognizers:  Industrial  Noise 
Discrete  Speech 

CTukey  Test  (alpha  =  .05)] 

(♦  -  ro»  significantly  better  than  coluan 
-  =  colunn  significantly  better  than  row) 


Recognizer 

1 

1  1 

2  1 

3  1 

4  1 

5  1  6  1 

7  1 

8  1 

9  1 

1 

1 

1 

1 

-  1  -  1 

-  1 

-  I 

2 

1 

1 

1 

-  I 

-  1  -  1 

-  1 

-  I 

-  | 

3 

1 

♦  | 

♦  | 

1 

1 

1  1 

1 

| 

♦  | 

4 

1 

♦  | 

♦  | 

1 

1 

1  1 

1 

1 

S 

1 

♦  | 

♦  | 

1 

1 

1  I 

1 

1 

6 

1 

♦  | 

♦  | 

1 

1 

1  1 

1 

1 

♦  ) 

7 

1 

♦  | 

♦  | 

1 

1 

1  1 

1 

1 

8 

1 

♦  | 

♦  { 

1 

1 

1  1 

1 

1 

♦  | 

9 

1 

1 

♦  1 

1 

1 

Key: 

1  ;  Verbex 

2  1  Interstate  4000 

3  -  Votan 

4  =  TI 

5  =  IBM 

6  =  Intel 

7  =  Interstate  CS8B 

8  =  in 

9  =  Kur swell 


29 


>V777T 


Table  7.  Differences  Between  Recognizers:  Fast  Food  Restaurant 
Discrete  Speech 

CTukey  Test  (alpha  =  .05)] 

(*  =  roi  significantly  better  than  colusn 
-  -  coluan  significantly  better  than  row) 


Recognizer 


1 


Key: 

1  :  Verbex 

2  *  Interstate  4000 

3  =  Votan 

4  *  TI 

5  *-  IBM 

6  =  Intel 

7  =  Interstate  CSRB 

8  =  ITT 

9  =  Kurzseil 


Results  from  the  Tukey  analysis  for  the  recognizer  by- 
speaker  interaction  indicated  significantly  lower  correct.  recog¬ 
nition  performance  rates  on  the  Interstate  4000  and  Verbex  for 
speakers  2  and  4  (female  and  male)  as  compared  with  ail  other 
recognizers  tested.  Fcr  speakers  1  and  3  (female  and  male)  this 
significantly  lower  performance  rate  was  found  only  with  the  In¬ 
terstate  4000  . 


Confusion  matrices  for  the  recognizers  having  more  than  10 
misrecognitons  and  error  matrices  for  all  recognizers  are  in  Ap¬ 
pendix  X I  . 

B.  Scenario  Two  -  Connected  Speech 

The  data  for  the  second  scenario  were  initially  analyzed 
using  a  General  Linear  Models  (GLM)  procedure  (Goodnight,  1971; 
Sail,  1978).  This  statistical  analysis  was  used  due  to  missing 
data  for  speaker  1  and  4  for  all  conditions  on  the  Texas  Instru¬ 
ments  recognizer  and  for  speaker  3  under  the  industrial  noise 
condition  using  the  Interstate  4000  and  under  the  Fast  Food  noise 
condition  for  the  Texas  Instruments  recognizer.  Though  these 
speakers  made  several  attempts  to  use  these  systems,  the 
speaker's  templates  were  either  too  large  for  memory  (TI)  or  the 
recognizer  was  unable  to  detect  any  utterances  made  by  the 
speaker  (Interstate  4000). 

The  dependent  variable  for  the  GLM  model  was  the  number  of 
correct  sentences,  while  the  independent  variables  were  recog¬ 
nizer,  noise  condition,  and  speaker.  Interaction  effects  of 
recognizer  by  speaker  and  recognizer  by  noise  condition  were  also 
included.  This  model  accounted  for  90%  of  the  total  variance. 
Table  8  indicates  that  all  main  effects  were  significant  ,  as  was 
the  interaction  effect  of  recognizer  by  speaker. 

The  significant  effects  were  further  analyzed  using  a  Tukey 
Test  with  an  alpha  level  of  .05.  Table  9  illustrates  the  sig¬ 
nificant  differences  in  performance  rates  (correct  sentences 
recognized)  between  recognizers.  A  plus  indicates  that  the 
recognizer  listed  in  the  row  performed  significantly  (p<=.05) 
better  than  the  recognizer  listed  in  the  column.  A  minus  sign 
indicates  the  opposite. 


Results  of  the  Tukey  test  for  the  noise  effect  demonstrated 
signif icantly  higher  recognition  rates  for  the  no  noise  and  fast 
food  restaurant  noise  conditions  as  compared  to  the  industrial 
noise  condition.  Results  from  the  analysis  of  the  speaker  effect 
indicated  that  overall,  speaker  2  (female)  attained  significantly 
higher  recognition  rates  than  did  speaker  4  (male)  (Appendix 
XII  )  . 

Tables  10  through  13  are  matrices  indicating  significant 
differences  for  the  speaker  by  recognizer  interaction.  The 


v v -..*j  ,  .  ■ .  - j.  j, >  a.  -  . 


// 


Table  10.  Differences  Between  Recognizers:  Speaker  1 


Recognizer  1 

[Tukey  Test  (alpha  =  .05)] 

Votan  1  T  I  1  Inter  1  Verbex  1  ITT  1 

Votan  1 

1  1  -  1  1  -  1 

TI  1  1  1  1  1  1 

Interstate  1 

♦  1  1  1  1  1 

Verbex  1  1  1  1  1  1 

ITT  1 

>1  1  1  1  1 

Table  11.  Differences  Between  Recognizers:  Speaker  2 
[Tukey  Test  (alpha  -  .05)1 

Recognizer  I  Votan  I  T  I  (  Inter  I  Verbex  I  ITT  I 
Votan  I 

TI  i 

Interstate  I 


Verbex 


* 


Table  12.  Differences  Between  Recognizers:  Speaker  3 
[Tukey  Test  (alpha  »  .05)3 

Recognizer  I  Votan  I  T  I  I  Inter  I  Verbex  I  ITT  I 

Votan  I  I  I  I  I  -  I 

TI  I  I  I  I  I  I 

Interstate  I  I  I  I  1  i 

Verbex  I  I  I  I  I  I 

ITT  I  ♦  I  I  I  I  I 


Table  13.  Differences  Between  Recognizers:  Speaker  4 


Recognizer  1 

[Tukey  Test  (alpha  =  .05)1 

Votan  1  T  I  1  Inter  1 

Verbex  1 

ITT  1 

Votan  1 

1  1  -  1 

1 

1 

TI  1  1  1  1  1  1 

Interstate  1 

♦  1  1  1 

1 

1 

Verbex  1 

♦  1  1  1 

1 

1 

ITT  1 

♦  1  1  1 

♦  1 

1 

Though  the  interaction  effect  of  recognizer  by  noise  was  not 
significant  for  this  scenario,  a  Tukey  analysis  was  still  com¬ 
pleted.  The  results  from  this  analysis,  with  the  noise  conditons 
held  constant,  are  located  in  Tables  14  through  16.  The  complete 
results  of  this  analysis  are  located  in  Appendix  XIV. 

Matrices  for  the  rejection  and  misrecognition  errors  for  the 
sentences  are  located  in  Appendix  XV.  The  misrecognition  errors 
were  further  analyzed  using  error  trees  and  these  are  also  lo¬ 
cated  in  Appendix  XVI. 


34 


l 


oe 

L 


(3 


i 

s 

5! 


.V 


f- 


I 


S* 


>' 

V* 


V 

> 


Since  the  method  of  data  construction  for  the  Tukey  test, 
based  on  the  GLM  procedure  does  not  indicate  mean  recognition 
scores,  these  are  included  in  separate  tables  in  Appendix  XVII. 


Table  14.  Differences  Between  Recognizers:  No  Noise 
[Tukey  Teet  (alpha  -  .05)1 


Recognizer  I 

Votan  1  T  I 

1 

Inter  1 

Verbei  1 

ITT  1 

Votan  l 

1 

1 

1 

1 

1 

TI  1  1  1  1  1  l 

Interstate  1 

1 

1 

1 

1 

1 

Verbes  1  1  1  1  1  1 

ITT  l 

♦  1 

1 

♦  I 

1 

1 

Table  15.  Differences  Between  Recognizers:  Industrial  Noise 


Recognizer  > 

[Tukey  Teat  (alpha  =  .05)1 

Votan  1  T  I  1  Inter  1  Verbei  I 

ITT  l 

Votan  l 

i  1  1  1 

l 

TI  f  1  1  1  l  l 

Interstate  1 

l  1  1  l 

l 

Verbei  i 

1  l  1  i 

i 

ITT  1 

♦  1  l*i*l 

t 

v 


35 


y 


•  -V . S  sV.'-  .  -V. 


V.N.V.V.V.V 


,r 


Table  16.  Differences  Between  Recognizers:  Fast  Food  Noise 


ITukey  Test  (alpha  =  .05)1 


Recognizer  1 

Votan  1  T  I 

1 

Inter  1 

Verbex  I 

in  1 

Votan  1 

1 

1 

1 

1 

1 

?I  1 

1 

1 

1 

1 

1 

Interstate  1  1  1  1  1  1 

Verbex  1 

♦  1 

1 

1 

( 

1 

ITT  1 

♦  I  ♦ 

1 

1 

t 

1 

C.  Scenario  Three  -  Connected  Speech 


An  Analysis  of  Variance  procedure  was  used  to  initially 
analyze  the  data  for  this  scenario.  The  dependent  variable  was 
the  number  of  correct  sentences,  with  independent  variables  con¬ 
sisting  of  recognizer,  noise,  speaker,  recognizer  by  noise,  and 
recognizer  by  speaker.  This  model  accounted  for  80  X  of  the  to¬ 
tal  variance. 

As  noted  in  Table  17,  only  two  main  effects  achieved 
significance;  recognizer  and  speaker.  The  data  for  this  scenario 
are  limited  by  the  reduced  number  of  speakers  and  recognizers 
tested.  Therefore,  independent  variables  that  may  otherwise  have 
had  a  significant  effect,  can  only  be  viewed  as  having  a  tendency 
to  affect  recognition  rates. 


Table  17.  Analysis  of  Variance:  Scenario  3 


Source 

df 

SS 

F 

P 

Recognizer 

2 

1558.74 

4.92 

.03 

Noise 

2 

1185.85 

3.52 

.06 

Speaker 

2 

3842.30 

11.39 

<.01 

Recognizer  *  Noise 

4 

865.48 

1.28 

.33 

Recognizer  •  Speaker 

4 

722.37 

1.07 

.41 

The  two  significant  main  effects  were  further  analyzed  using 
a  Tukey's  Studentized  Range  Test  with  an  alpha  of  .05.  Results 
from  these  analyses  demonstrate  that  the  ITT  recognizer  achieved 
significantly  higher  recognition  rates  as  compared  to  the  Inter¬ 
state  4000.  Results  from  the  speaker  effect  indicate  that 
speaker  1  achieved  significantly  higher  recognition  rates  than 
speakers  2  and  3.  The  complete  results  from  these  analyses  are 
in  Appendix  XVIII.  No  further  statistical  analysis  was  performed 
on  the  data  from  scenario  three,  as  no  other  significant  effects 
were  observed. 

Matrices  for  the  rejection  and  misrecognition  errors  for  the 
sentences  are  in  Appendix  XIX.  The  misrecognition  errors  were 
further  analyzed  using  error  trees  and  these  are  also  in  Appendix 
XX. 


IX.  Discussion 


The  discussion  section  is  in  three  parts.  First,  observa¬ 
tions  that  apply  to  the  entire  project  in  general.  Then  two  sec¬ 
tions  with  remarks  mainly  pertinent  to  discrete  and  connected 
speech,  respectively. 

A.  General 

It  is  not  surprising  that  recognizer  performance  was  higher, 
overall,  in  low  noise  than  in  high  noise  conditions  --  after 
all,  people  hear  better  in  low  noise  too.  Yet  there  have  been 
reports  of  recognizers  performing  as  well  or  better  in  very  noisy 
environments  than  in  less  noisy  ones.  The  results  reported  here 
do  not  settle  this  question  in  either  direction.  Discrete  speech 
recognition  was  better  under  low  noise,  but  connected  recognition 
was  better  under  both  low  noise  and  fast  food  restaurant  high 
noise  (i.e.,  mostly  voice  noise),  as  opposed  to  industrial  high 
noise  with  its  much  wider  frequency  spectrum.  These  effects, 
however,  were  not  uniform.  Noise  had  different  effects  on  dif¬ 
ferent  recognizers. 

The  major  source  of  variability  in  speech  recognition  is  the 
individual  speaker.  Both  inter  and  intra  speaker  variabilities 
occur,  often  to  a  high  degree.  The  effects  of  inter-speaker 
variation  is  minimized  when  comparing  ASRs,  providing  the  same 
speaker  population  is  used  throughout,  which  it  was.  While  this 
statistcally  minimizes  the  effects  of  interspeaker  variance,  it 
is  still  important  to  recognize  that  a  significant  recognizer  * 
speaker  interaction  term  was  observed  in  all  but  the  most  tightly 
constrained  case.  This  implies  that  it  may  often  be  necessary  to 
match  speakers  and  recognizers  in  many  applications. 


Intra-speaker  variation  is  affected  by 


many  psychological 


and  physiological  factors.  They  impact  speech  recognition  within 
a  single  speaker  over  time  periods  as  short  as  a  few  seconds. 
Day  to  day  variations  may  be  considerable,  and  may  persist  for  a 
long  time.  Thus,  if  a  speaker  has  an  "off"  day  when  a  certain 
recognizer  is  being  tested,  those  results  may  prejudice  the 
results  against  that  system  (which  argues  for  replication).  Sig¬ 
nificant  drift  experienced  by  motivated  speakers,  however,  should 
be  relatively  slow  changing.  Thus,  the  random  deviations  should 
be  slowly  moving  about  the  target  templates.  If  the  sessions  are 
not  overly  long,  the  net  effect  should  be  tolerable.  Having  more 
speakers  would  have  reduced  the  effect  of  the  variance  and, 
thereby  increased  the  probability  of  significant  findings,  leav¬ 
ing  these  results  as  somewhat  conservative  . 


I 


f 


vs 


With  a  small  speaker  pool,  the  order  in  which  recognition 
systems  are  tested  may  be  important,  due  to  experience  with 
speech  recognizers.  Thus,  although  order  was  randomized  as  far 
as  possible,  there  is  still  the  possibility  that  some  order  ef¬ 
fects  may  still  contaminate  the  results.  Without  a  large  speaker 
pool,  which  allows  full  randomization  of  the  order  of  testing, 
this  factor,  if  indeed  significant,  cannot  be  eliminated. 


J' 


t. 


’  \ 


■f. 


> 


B.  Discrete  Speech 


All  of  the  above  remarks  apply  in  general  to  the  discrete 
speech  part  of  the  experiment.  Specific  results  indicated  that 
the  Verbex  4000  and  the  Interstate  S4000  performed  significantly 
poorer  than  the  other  systems.  Both  these  machines  were  designed 
specifically  for  connected  speech,  and  were,  therefore , opera ting 
at  a  severe  disadvantage  on  an  isolated  word  task.  The  Kurzweil 
recognizer,  despite  its  long  enrollment  process,  and  its  design 
for  large  vocabularies,  did  relatively  poorly  on  the  16  word 
vocabulary.  We  do  not  discount  the  possibility  that  the  exces¬ 
sively  long  enrollment  procedures  (which  may  be  necessary  for 
larger  vocabularies)  may  have  "put  speakers  off"  this  device, 
either  consciously  or  subconsciously. 

This  study  found  a  significant  interaction  between  the 
various  recognizers  and  various  noise  conditions.  The  recognizer 
*  noise  term  implies  that  it  is  not  likely  that  any  one  of  the 
current  recognizers  will  have  the  best  performance  under  all 
noise  conditions.  This  interaction  appears  to  be  primarily  due 
to  better  relative  performance  of  the  Kurzweil  system  with  the 
Fast  Food  background  noise.  For  this  reason,  this  interaction 
may  not  be  oberved  under  different  conditions. 

The  16  word  vocabulary  chosen  to  test  the  discrete  speech 
recognizers  consisted  of  the  ten  digits  and  six  control  words. 
This  is  not  a  particularly  difficult  vocabulary,  and  the  fact 
that  all  recognizers  did  not  perform  at  near  perfect  levels  may 

38 


»T«FifTvr\nM  wif  mniwuTWMinF  w«M''imiTW^Kr»nii»«inwunmfiwLinv  miwwwrw^ion 


be  a  strong,  but  sad,  indication  of  how  much  progress  remains  to 
be  made  in  ASR  design.  Even  more  strongly  it  indicates  how  im¬ 
portant  are  the  user  and  user  environment.  There  is  little  doubt 
that  with  carefully  chosen,  highly  trained  and  motivated  users  in 
a  controlled  environment  (not  necessarily  noise  free),  near  per¬ 
fect  performance  can  be  obtained  by  all  vendors  on  such  small 
vocabularies.  The  use  of  "average"  speakers  in  a  loosely  con¬ 
trolled  environment  is,  however,  justified  as  being  closer  to 
likely  application  scenarios. 

Error  analysis  consisted  of  tabulating  and  analyzing  rejec¬ 
tion  errors  and  substitution  errors  (while  noting  the  substituted 
word).  Speakers  were  not  allowed  to  enter  non-vocabulary  words 
or  sounds,  so  that  insertion  errors  did  not  occur.  (An  insertion 
error  is  when  the  recognizer  interprets  some  non- vocabulary  word, 
or  a  sound  such  as  a  cough,  as  a  word  in  the  vocabulary.)  To 
measure  performance  vis-a-vis  insertion  error  avoidance,  the 
recognizer  threshold  values  (or  their  equivalents)  would  have  to 
be  adjusted,  which  was  deemed  impractical  at  the  time. 

The  generic  approach  of  setting  the  recognition  parameters 
at  manufacturer's  recommended  levels  was  taken.  Experience  has 
shown,  however,  that  depending  on  the  demands  of  the  application, 
much  benefit  may  result  from  an  adjustment  of  these  parameters. 
Indeed,  there  is  evidence  that  individual  speakers  may  benefit 
from  a  fine-tuning  of  these  parameters  to  their  idiosyncratic 
needs.  A  close  study  of  how  threshold  adjustment  might  affect 
the  relative  performance  among  recognizers  was  beyond  the  scope 
of  the  project,  however,  and  should  be  considered  for  future  re¬ 
search  . 


C .  Connected  Speech 


Table  9  indicates  that  the  ITT  ASR  did  significantly  better 
than  the  other  four,  and  that  the  VOTAN  did  significantly  worse 
than  all  but  one  other  recognizer.  There  are  several  factors 
that  may  account  for  this  observation. 

In  the  case  of  the  ITT  machine,  testing  was  done  in  a  dif¬ 
ferent  room  due  to  overheating  problems  in  the  laboratory  that 
housed  the  other  tests.  Additionally,  the  ITT  system  was  a 
prototype  system  and  required  ITT  technical  assistance.  As  a 
prototype,  not  all  the  necessary  enrollment  evaluation  routines 
exist.  This  forced  technical  intervention  in  the  enrollment 
stage  and  perhaps  allowed  the  system  to  be  tailored  more 
precisely  to  the  scenario  than  was  possible  with  the  development 
tools  available  with  the  other  systems.  Because  of  the  substan¬ 
tially  greater  computing  power  and  memory,  the  ITT  machine  also 
permitted  the  strictest  syntaxing.  That  factor,  too,  helped  to 
account  for  the  better  results. 


39 


J5 

S! 


The  use  of  "syntaxing"  is  a  crucial  factor,  especially  in 
connected  speech,  but  in  discrete  speech  as  well.  All  the  recog¬ 
nizers  tested  operate  by  comparing  a  current  template  with  a 
series  of  prestored  templates  gathered  during  enrollment.  The 
more  prestored  templates,  the  greater  the  chance  for  error.  Syn¬ 
taxing  allows  the  comparison  to  be  made  on  a  subset  of  the  pres¬ 
tored  templates,  hence  reducing  the  possibility  for  error.  The 
term  syntaxing  is  used  because  the  subsetting  operation  is 
generally  based  on  the  permissible  co-occurrence  of  utterances. 
By  designing  transactions  that  permit  "heavy*  syntaxing  --  the 
reduction  of  candidate  templates  by  an  average  factor  of  three  or 
more  --  recognition  performance  can  be  dramatically  improved.  As 
noted  above,  part  of  the  reason  for  the  success  of  the  ITT  recog¬ 
nizer  was  due  to  its  ability  to  handle  heavy  syntaxing. 

The  VOTAN  device,  on  the  other  hand,  permitted  no  syntaxing 
whatsoever,  which  put  it  at  a  disadvantage  compared  to  the  other 
machines.  The  only  ASR  the  VOTAN  did  not  perform  worse  than  was 
the  TI .  The  TI  machine,  however,  refused  to  perform  at  all  for 
two  of  the  four  speakers  whose  natural  speaking  rate  was  slower, 
apparently  due  to  limits  on  available  memory.  In  one  sense,  this 
lack  of  data  severely  compromises  any  conclusions  that  could  be 
drawn  about  the  TI  machine,  or  a  comparison  between  it  and  the 
VOTAN.  In  another  sense,  however,  the  comparisons  are  quite 
valid  since  the  TI  was  functionally  unable  to  perform  the  task. 
Thus,  while  syntaxing  may  substantially  reduce  errors,  it  may  be 
a  two-edged  sword;  it  may  increase  the  task  support  complexity 
beyond  the  capacity  of  many  recognition  systems. 

Error  analysis  in  connected  speech  is  far  more  difficult 
than  in  discrete  speech  because  a  much  wider  variety  of  error  is 
possible.  In  addition  to  substitution  errors,  and  possibly 
rejection  of  all  or  part  of  the  input,  the  following  errors  may 
recur:  Insertion  errors:  extra  words  are  inserted.  Deletion 
errors:  spoken  words  are  omitted.  Merge  errors:  two  or  more 
words  are  recognized  as  one  or  more  words  Split  errors:  one  or 
more  words  are  recognized  as  two  or  more  words.  These  errors  may 
occur  in  any  combination  and  in  any  number  in  a  given  utterance, 
sometimes  leading  to  recognizer  output  that  is  best  described  as 
"word  hash . " 

One  way  of  reporting  results  for  connected  speech,  is  to 
simply  report  on  the  percentage  of  sentences  interpreted  without 
error.  This  was  the  initial  basis  for  comparison  among  the  five 
ASRs  used  in  thi3  connected  speech  scenario.  Additionally,  in¬ 
dividual  tabulations  were  completed  to  indicate  which  of  the  sen¬ 
tences  the  recognizers  completely  rejected  (no  recognition  of  any 
word);  and  which  of  the  sentences  contained  any  of  the  pre¬ 
viously  mentioned  errors  (Appendix  XV).  The  sentences  that  con¬ 
tained  some  type  of  recogniton  error  were  further  analyzed  using 
error  trees  (Appendix  XX). 


40 


.V.v.v .v.-.  .  ■  .•>  a.v  't.s.v.V.vV.w ^ 


The  error  trees  compiled  for  this  study  were  developed  in  an 
effort  to  assess  the  degree  and  manner  in  which  an  individual 
recognizer  errs  in  word  identification  once  at  least  one  word  in 
a  sentence  has  been  misrecognized,  omitted,  or  inserted.  Such  an 
analysis  is  of  interest,  because  certain  types  of  errors  are 
easier  to  handle  than  others  (depending  on  the  magnitude  of  the 
errors).  For  example,  suppose  two  recognizers  achieve  the  same 
percentage  correct  sentences  for  a  given  scenario,  but  when  the 
first  recognizer  errs,  it  returns  a  sentence  that  is  totally  in¬ 
coherent  and  unrelated  to  what  was  said,  whereas  the  second 
recognizer  returns  a  sentence  with  a  single  substitution  error. 
The  second  recognizer's  performance  should  be  considered  superior 
to  the  first  recognizer  since  the  error  of  the  second  recognizer 
could  be  corrected  more  easily  than  the  error  of  the  first  recog¬ 
nizer  . 

In  an  attempt  to  quantify  errors  made  in  Scenario  2,  two 
values  were  computed  for  each  recognizer  by  noise  condition, 
length  of  transaction  (3,  4,  or  5  words),  and  speaker.  These 
values  (right  margin  of  Appendix  XVI)  indicate  the  number  of 
words  correctly  identified  (R)  over  the  total  number  of  the  words 
(L)  in  the  transaction  (R/L)  and  the  number  of  wrong  words 
(insertions,  merges,  splits  and  misrecogni tions )  (W)  over  the  to¬ 
tal  number  of  words  (L)  in  the  transaction  (W/L).  The  numbers 
listed  were  averages  obtained  based  on  analysis  of  all  the  errors 
for  the  particular  recognizer,  noise  condition,  sentence  length, 
and  speaker.  Since  the  number  of  sentences  containing  errors 
varied  by  recognizer  (Appendix  XIX),  the  percentages  only  indi¬ 
cate  the  type  and  degree  of  error  that  occurs  when  an  error  does 
occur . 

The  value  for  the  first  percentage,  R/L,  ranged  from  0  to  1 . 
Zero  indicated  that  the  recognizer  did  not  correctly  identify  and 
word  spoken,  and  one  meant  that  the  recognizer  correctly  recog¬ 
nized  all  the  words  spoken,  but  also  inserted  words  that  had  not 
been  spoken.  The  value  for  the  second  percentage,  W/L,  is  in 
theory  unbounded,  with  numbers  approaching  infinity  occurring 
when  a  recognizer  cannot  detect  the  end  point  of  an  utterance(as 
might  occur  in  a  high  noise  environment).  However,  in  this 
study,  the  number  rarely  exceeded  a  value  of  1,  except  when  the 
recognizer  used  strict  syntaxing  and  misrecognized  the  utterance 
completely . 

In  theory,  with  all  other  factors  being  equal,  the  better 
recognizers,  those  that  would  be  most  amenable  to  present  day  er¬ 
ror  detection  and  correction  strategies,  would  be  those  that  had 
high  R/L  percentages  and  low  W/L  percentages.  Two  caveats  must 
be  made.  This  scoring  method  judges  an  omission  error  to  be  bet¬ 
ter  than  a  substitution  error  (which  is  debatable).  For  example, 
suppose  the  following  sentence  was  uttered,  "Driver  move  tank  out 
slower".  If  the  recognizer  returned  the  sentence  "Driver  move 
tank  out  faster",  the  first  percentage  would  be  R/L  =  .8  (4/5  - 


41 


>  Vn 


»>£  s 


.8),  while  the  second  percentage  would  be  W/L  =  .2  (1/5  =  .2). 
(The  results  of  this  type  of  error  could  also  lead  to  serious 
problems  for  the  driver  of  the  tank!)  However,  if  the  recognizer 
returned  the  sentence  "Driver  move  tank  out"  the  R/L  still  equals 
.8,  but  W/L  now  equals  zero.  (The  Driver  of  the  tank  could  also 
ask  for  the  correct  speed  with  which  to  move  the  tank!)  A  second 
problem  with  these  percentages  is  that  they  do  not  indicate 
whether  the  recognizer  is  consistently  making  the  same  mistakes 
for  the  same  sentences ,( in  which  case  retraining  the  recognizer 
may  solve  the  problem),  or  if  the  errors  are  inconsistent  and 
randomly  distributed  (may  require  redesign  of  application). 

The  two  percentages  calculated  could  also  be  supplemented  by 
a  third  number,  not  completed  for  this  project.  This  number 
would  reflect  the  degree  to  which  the  recognizer  is  able  to  iden¬ 
tify  the  correct  number  of  word  boundaries  (correct  number  of 
words  utterred) .  This  number  could  be  considered  the  ratio  of 
the  difference  between  the  number  of  words  utterred  minus  the 
number  of  words  recognized  (D)  over  the  total  number  of  words  (L) 
in  the  sentence  (D/L).  The  better  recognizers  (more  amenable  to 
error  detection  and  correction),  would  be  those  whose  D/L  ratio 
approached  zero. 

The  R/L  and  W/L  ratios,  though  initially  computed  at  the 
lowest  level  of  recognizer  *  noise  *  sentence  length  *  speaker 
are  also  listed  for  levels  of  recognizer  *  noise  *  sentence 
length;  recognizer  *  noise;  and  recognizer  (Appendix  XX).  Again, 
as  the  numbers  are  based  on  an  average  incorrect  utterance  only, 
and  not  on  the  number  of  incorrect  utterances,  the  data  must  be 
reviewed  cautiously.  A  recognizer  that  misses  one  utterance  in  a 
thousand  but  reports  no  correct  words  for  that  sentence  will  ap¬ 
pear  the  worse  than  a  recognizer  that  always  returns  one  or  two 
errors  in  each  sentence.  Therefore,  the  error  trees  must  be  in¬ 
terpreted  with  the  additional  data  in  Appendix  XV .  When  two 
recognizers  have  approximately  the  same  number  of  misrecognized 
sentences,  the  error  trees  can  be  used  for  meaningful  com¬ 
parisons  . 

The  error  trees  for  the  third  scenario  were  constructed  so 
as  to  accurately  reflect  the  type  of  errors  that  occurred  with 
the  connected  digits.  Since  the  scenario  employed  a  highly 
restricted  syntax,  for  all  but  the  connected  digits,  the  other 
words  in  the  sentence  were  correctly  recognized  (with  only  a  few 
exceptions ) . 

The  three  error  trees  thus  reflect  the  types  of  errors  (S  = 
substitutions,  I  =  insertions,  0  =  omissions),  that  occurred  for 
each  of  the  recognizers  tested  (Verbex,  ITT,  Interstate  4000). 
The  results  for  each  recognizer  were  further  categorized  by  the 
noise  condition  in  which  the  recognizer  was  tested  (N).  At  the 
lowest  level,  the  data  reflect  recognizer  error  by  noise  (N),  by 
error  type  (S,I,0),  by  speaker  (si,  32,  s4). 


VA  V  % 


42 


tat 

U 


S 


% 


.*.■ 


I 


%  * 


5 


$ 


v. 

> 


These  results,  in  contrast  to  the  previous  error  trees,  rep¬ 
resent  the  total  number  of  errors  for  each  speaker  by  error  type, 
noise  and  recognizer,  and  can  therefore  accurately  be  compared 
with  each  other  (between  recognizers).  However,  the  results 
again,  do  not  reflect  "consistent"  recognizer  error  where  the 
same  mistakes  are  always  made,  as  opposed  to  "random"  errors. 
However,  consistent  recognizer  error  would  be  much  easier  to  cor¬ 
rect  (typically  by  retraining  the  particular  digit  template). 
The  error  trees  also  do  not  weight  the  different  types  of  errors 
in  any  way  when  obtaining  the  average  errors  per  speaker  score 
(SE) .  The  resulting  score  is  somewhat  misleading  in  that  the 
correction  for  substituted  digits  would  be  significantly  harder 
than  correction  for  insertions  or  omissions. 


X.  Summary  and  Conclusions 


While  there  is  no  conclusive  evidence,  then,  that  any  of  the 
recognizers  tested  consistantly  excelled  beyond  the  others,  some 
important  conclusions  can  be  drawn  about  the  recognition  of  con¬ 
nected  speech  in  general. 

The  "care  and  feeding"  of  speakers  is  all  important.  This 
point  cannot  be  emphasized  too  strongly.  Performance  appears  to 
vary  depending  on  the  mood,  motivational  level,  and  frustration 
level  experienced  by  speakers.  Systems  (and  applications )  must 
be  designed  to  minimize  these  performance  moderators.  The  notion 
that  any  worker  can  use  a  recognition  system  with  just  a  few 
hours  of  orientation  is  wrong  and  may  often  result  in  the  failure 
of  a  project  that  might  otherwise  be  a  success. 

At  the  same  time,  designer  and  manufacturers  of  ASRs  must 
pay  attention  to  the  extraction  of  linguistically  significant  in¬ 
formation  from  the  speech  signal.  Humans  have  little  trouble  un¬ 
derstanding  other  humans  when  they  are  angry,  sick,  or  sobbing. 
The  information  is  present  in  the  speech  signal;  it  remains  to  be 
used . 


Training  time  varied  somewhat  among  recognizers.  The  en¬ 
rollment  procedures  for  both  the  Verbex  and  the  Interstate  was 
considerably  longer  and  more  tedious  than  for  the  ITT,  for  ex¬ 
ample.  Users  of  systems  are  likely  to  experience  a  substantial 
amount  of  training  as  long  term  speech  shifts  occur.  As  a 
result,  it  is  very  desireable  to  have  minimal  training  time  to 
reduce  the  non-productive  time  an  employee  spends  on  the  system. 

None  of  the  devices  tested  could  be  used  in  a  speaker  inde¬ 
pendent  setting,  nor  did  any  claim  to  be  speaker  independent. 
Although  it  is  often  thought  that  speaker  independence  is  neces¬ 
sary  for  application  value,  the  larger  vocabularies  of  the 

43 


speaker  dependent  systems  make  them  useful  today  for  many  tasks. 

Moderate  vocabulary  size  systems  are  available  from  a  number 
of  vendors  that  should  have  the  capability  of  supporting  educa¬ 
tional  as  well  as  performance  maintenance  roles. 

Speaker  independent  systems  are  currently  limited  to  small 
vocabularies  and  are  probably  of  insufficient  robustness  to  sup¬ 
port  class  or  field  applications.  Watch  this  group  -  much  work 
is  being  done  and  there  will  likely  be  some  major  progress  in  the 
short  term. 

Speaker  dependent  discrete  systems  are  currently  most  noise 
tolerant.  With  proper  design,  they  will  likely  be  adequate  for 
most  class  or  field  applications . 

Speaker  dependent  connected  recognizers  are  becoming  much 
less  noise  sensitive.  As  this  evolution  proceeds,  they  will 
likely  be  perceived  as  more  appropriate  for  all  applications. 
There  are  many  assumptions  but  no  current  evidence,  however,  that 
suggests  that  humans  interact  better  with  a  connected  speech 
recognizer . 

The  wide  variation  of  software  support  provided  by  the  ven¬ 
dors  results  in  difficult  "porting"  of  applications  from  system 
to  system.  A  very  useful  research  and  development  task  would  be 
the  development  of  an  "application  generator"  that  not  only  sup¬ 
ported  a  range  of  products  from  different  vendors,  but  also  en¬ 
couraged  the  voice  system  integrator  to  fully  consider  the  many 
application  design  issues  (eg.  prompting,  help,  editing,  error 
recovery,  etc.). 

Finally,  the  technology  is  certainly  mature  enough  to  sup¬ 
port  both  training  and  maintenance  applications.  In  such  a 
scenario,  most  of  the  voiced  input  would  be  commands,  so  even 
discrete  speech  recognizers  would  function  well.  The  addition  of 
an  intelligent  post-processor  to  further  filter  the  input  would 
likely  reduce  the  potential  impact  of  most  recognizer  errors  to 
verification  rather  than  editing  or  re-entry.  Unfortunately, 
this  post-processing  function  i3  not  available  in  a  generic  for¬ 
mat  and  will  require  application  specific  development.  The  basic 
premises  of  such  a  system  are,  however,  known.  It  is  suggested 
that  the  next  step  is  to  develop  an  application  prototype  and, 
through  the  prototype,  define  the  requirements  for  an  error 
detection  and  correction  post  processor. 


XI .  Glossary 


Application  generator  -  Software  with  the  capability  to  automati¬ 
cally  generate  the  necessary  programs  to  support  an  application 
baseed  on  design  requirements  input. 

Coarticulation  -  The  phenomenon  observed  when  pronouncing  two 
words  together  results  in  the  component  sounds  being  changed. 

Connected  speech  -  Speaking  words  fully  and  distinctly  with  no 
unnatural  pauses  between  them. 

Continuous  speech  -  Speech  as  typified  by  human  to  human  speech. 
Words  are  often  run  on  and  sounds  are  missing. 


DbA  -  Sound  pressure  measurement  (in  decibels)  using  the  A 
weighting  scale. 

DbC  -  Sound  pressure  measurement  (in  decibels)  using  the  C 
weighting  scale. 

Discrete  speech  -  Speech  in  which  each  word  is  fully  and  dis¬ 
tinctly  pronounced  with  short  pauses  between  each  word. 

Dynamic  update  -  The  process  of  updating  speech  recognition 
templates  during  performance  without  the  need  to  enter  some  per¬ 
formance  maintenance  process. 


Enrollment  -  The  process  of  training  the  speech  recognition  sys¬ 
tem  to  the  user's  voice.  Templates  are  extracted  from  prompted 
speech  to  be  used  for  future  comparison. 

Form  factor  -  Physical  attributes  of  a  system.  Determines  which 
host  systems  are  compatable,  ie.  will  the  board  fit? 

Front  end  gain  -  Amplification  applied  to  the  signal  provided  by 
the  microphone. 

Front-end  amplifier  -  Amplifier  to  provide  front  end  gain. 
Intensity  -  Amplitude  of  speech  or  noise  usually  related  in  dB. 

L. *  -  Equivalent  or  perceived  loudness. 

Loudness  -  Perceived  intensity. 

Match  score  -  The  degree  to  which  an  utterance  matches  a  stored 
template . 


v. 

i  '* 
i 

J 


Multibus 


Computer  backplane  standard. 


PC  bus  -  Computer  backplane  standard. 

RS-232  -  Communications  protocol  standard  (this  one  is  not  always 
interpreted  the  same  by  all  vendors). 

Speaker  dependent  -  Speech  recognition  in  which  the  user  must 
have  enrolled  speech  patterns. 

Speaker  independent  -  Speech  recognition  in  which  utterences  are 
identified  using  generic  information. 

Speech  onset  -  The  start  of  an  utterance;  nominally  when  the 
energy  level  increases  above  ambient. 

Speech  template  -  A  pattern  derived  from  speech  against  which  fu¬ 
ture  utterences  will  be  compared. 

Speech  termination  -  The  end  of  an  utterence;  nominally  when  the 
energy  level  returns  to  ambient. 

Syntaxing  -  The  specification  of  rules  which  identify  the  pos¬ 
sible  (allowed)  sequence  of  words  in  the  vocabulary. 

Template  -  Stored  pattern  derived  from  speech  during  training 
against  which  future  utterences  are  compared. 

Token  -  Often  used  interchangably  with  template  but  usually  is  a 
template  derived  from  a  single  utterance. 

Transaction  generator  -  Software  that  automatically  generates  the 
necessary  programs  to  support  application  transactions.  This  is 
a  subset  of  application  generators. 

Tukey  test  -  A  statistical  test  to  isolate  sources  of  sig¬ 
nificance  from  pooled  information. 


XII.  References 


3 


N 


A 

A' 


0. 
•  • 


Baker,  J.M.  (1982).  The  performing  arts  --  how  to  measure  up.  In 

D.  S.  Pallet  (Ed.),  Proceedings  of  the  Workshop  on _ Standard - 

ization  for  Speech  I/O  Technology.  Gaithersberg ,  MD.:  National 

Bureau  of  Standards. 

Berger,  E.H.,  Ward,  W.B.,  Morrill,  J.C.,  &  Royster,  L.H.  (1986). 
Noise  and  hearing  conservation  manual.  American  Industrial 
Hygiene  Association. 

Black,  J.W.  (1950).  The  effect  of  room  characteristics  upon  vocal 
intensity  and  rate.  The  Journal  of  the  Acoustic  Society  of 
America ,22  ( 2 ) ,  174-176. 

Borden,  G.J.  (1979).  An  interpretation  of  research  on  feedback 
interruption  in  speech.  Brain  and  Language,  7,  307-319. 

Doddington,  G.R.,  &  Schalk,  T.B.  (1981).  Speech  recognition: 

Turning  theory  to  practice.  IEEE  Spectrum.  18 ,  26-32. 

Draegert,  G.L.  (1951).  Relationships  between  voice  variables  and 
speech  intelligibility  in  high  level  noise.  Speech 
Monographs, 18,  272-278. 

Dunnett,  C.W.  (1980).  Pairwise  multiple  comparisons  in  the 
homogeneous  variance,  unequal  sample  size  cases.  Journal  of  the 
American  Statistical  Association, 75 , 372 . 


j  Garber,  S.F.,  Siegel,  G.M.,  Pick,  H.L,  &  Alcorn,  S.R.  (1976).  The 

influence  of  selected  masking  noises  on  Lombard  and  sxdetone 
amplif ication  effects.  Journal  of  Speech  and  Hearing  Research, 
19.,  523-535. 

Goodnight,  J.H.  (1971).  The  new  General  Linear  Models  procedure. 
Proceedings  of  the  First  International  SAS  Users'  Meeting 

Howell,  K.,  &  Martin,  A.M.  (1975).  An  investigation  of  the  ef¬ 
fects  of  hearing  protectors  on  vocal  communication  in  noise. 
Journal  of  Sound  Vibration.  41  ( 2 ) ,  181-196. 

V 

Kryter ,  K.D.  (1946).  Effects  of  ear  protective  devices  on  the 
intelligibility  of  speech  in  noise.  Journal  of  the  Acoustic 

•/  Society  of  America,  18  ( 2 )  .  413-417. 

Lane,  H.  L.  &  Tranel ,  B.  (1971).  The  Lombard  sign  and  the  role  of 
hearing  in  speech.  Journal  of  Speech  and  Hearing  Research,  1 4 , 
677-709 . 


v 


47 


Larson,  N.,  Moody,  T.,  &  Joost,  M.  (1986).  The  effects  of 
background  noise  on  ASR  performance  using  inertial  and  headset 
microphones.  North  Carolina  State  University  Technical  Report 
No.  TR- IE-86 -7 . 

Nusbaum,  H.C.,  Davis,  C.N.,  Pisoni,  D.B.,  &  Davis,  E.  (1986). 
Testing  the  performance  of  isolated  uterance  speech  recogni¬ 
tion  devices.  Proceedings  of  AVIOS  *86.  393-408. 

Pallett,  D.S.  (1985).  Performance  Assessment  of  Automatic  Speech 
Recognizers.  Journal  of  Research  of  the  National  Bureau  of 
Standards ,  90(5) ,  (Sep./Oct.). 


Peterson,  A.P.  (1980).  Handbook  of  noise  measurement.  Concord, 
Mass.:  GenRad,  Inc. 

Pisoni,  D.B.;  Bernacki,  R.H.,  Nusbaum,  H.C.,  &  Yuchtman,  M. 
(1985).  Some  acoustic-phonetic  correlates  of  speech  produced 
in  noise.  IEEE,  1581-1584. 

Plice,  G.W.  (1983).  Choosing  a  microphone.  Speech  Technology,  2, 
( Sept . /Oct .) ,  17. 

Rollins,  A.,  &  Wiesen,  J.  (1983).  Speech  recognition  and  noise. 

ICASSP,  523-526. 

Sail,  J.P.  (1978).  SAS  regression  applications.  SAS  Technical 
Report  A-102,  Raleigh,  SAS  Institute. 

Searle,  S.R.  (1971).  Linear  Models.  New  York,  John  Wiley  and 
Sons  . 

Sheffe,  H.  (1959).  The  Analysis  of  Variance.  New  York,  John 
Wiley  and  Sons. 

Siegel,  G.M.,  &  Pick,  H.L.  (1974).  Auditory  feedback  in  the 

regulation  of  voice.  Journal  of  the  Acoustic  Society  of 
America ,  56(5),  1618-1624. 

Tukey,  J.W.  (1952).  Allowances  for  Various  Types  of  Error  Rates. 
Unpublished  IMS  address,  Chicago,  Illinois. 

Waller,  H.F.  (1985).  Choosing  the  right  microphone  for  speech 
applications.  Proceedings  of  Speech  Tech  *85.  (p.45). 

Webster,  J.C.,  &  Klumpp,  R.G.  (1962).  Effects  of  ambient  noise 
and  nearby  talkers  on  a  face-to-face  communication  task.  The 
Journal  of  the  Acoustical  Society  of  America,  34  ( 7 ) ,  936-941. 


Appendix  I .  Vendor  List 


Product  naae,  contact  and  address: 


Dr.  H.  Mangold 

AEG  Telefunken  Machrichtentechnik  GaBh 
Postfach  1120  7150  Bachnang,  Vest  Geraany 


Description  of  Product  Capabilities: 

Speaker  dependent  or  independent? 

Type  of  speech: 

Method  of  speech  recognition: 

Training  Method: 

Vocabulary  Liaitations 

Muaber  of  words  in  active  vocabulary: 
Vocabularies  in  systea: 
lord  length  liait: 

Built  in  syntaxing: 

Response  tiae: 

Hiniaua  tiae  between  utterances: 
Teaplates  updated  continuously: 

Coapatibility  of  Systea 

Systea  coapatibility: 

Languages  Supported: 

Prograaaing  required: 

Microphone  /  Telephone  inforaation 

Telephone  access: 

Recoaaended  aicrophone  and/or  jack  type: 


Testing  of  ASR 

Independent  Tests: 

Tests  in  noise: 

Existing  Applications: 

Price  and  size  inforaation 

Price: 

Size  of  systea: 

Custoaer  support: 


49 


Product  dim,  contact  and  address: 


ATAT’s  Conversant  1  Voice  Systea 
Dr.  Christopher  D.  Farrar 

1717 

6200  East  Broad  Street  Coluabus,  Ohio  43213 
614-860-3278  or  800-341-2272 

Description  of  Product  Capabilities: 

Speaker  dependent  or  independent?  Both 
Type  of  speech:  Isolated  and  Connected 

Method  of  speech  recognition:  teaplate  with  phonetic  enhanceaents 
Training  Method:  2  to  4  for  Dep 

Vocabulary  Liaitations 

Muaber  of  words  in  active  vocabulary:  Dis  Ind  digits, yes, no, oh;  Con  Ind  digits,  yes,  no;  dep  256  words 
Vocabularies  in  systea:  n/a 

Word  length  liait:  Dis  -  Bax  2.01  sec;  others  application  dependent 

Built  in  syn taxing:  optional 

Response  tine:  2b0  as  aaxiaua  to  next  proapt 

Hiniaua  tiae  between  utterances:  for  dep  is  prograaaable  default  195  as 

7eaplates  updated  continuously:  no 

Coapatibility  of  Systea 

Systea  coapatibility:  Stand  alone;  Unix  operating  systea;  asynchronous,  bisynchronous  3270  &  SMA/SDLC 
Languages  Supported:  C 
Prograaaing  required:  none  required 

Microphone  /  7elephone  inforaation 

Telephone  access:  yes 

Recoaaended  eicrophone  and/or  jack  type: 

telephone 

Testing  of  ASR 

Independent  Tests:  not  available 

Tests  in  noise:  no  specific  tasting  but  aeant  for  telephone  lines 

Existing  Applications:  yes  stock  quotation. 

Price  and  size  inforaation 

Price:  pricing  on  individual  basis;  volute  discounts  available  to  VAR’s 
Site  of  systea:  25x22x15  *1001bs 

Custoaer  Support:  training  and  warranty 


Product  naae,  contact  and  address: 


SSB-1000  Speech  Recognition  Board 

Hr.  Arthur  9.  Celona 

AUDEC 

299  Harket  Street  Saddle  Brook,  He w  Jersey  07662 
201-368-3848 

Description  of  Product  Capabilities: 

Speaker  dependent  or  independent?  dependent 

Type  of  speech:  discrete 

Method  of  speech  recognition:  template 

Training  Method:  one  pass  for  enrollaent,  2  additional  training  passes 

Vocabulary  Limitations 

Ruaber  of  words  in  active  vocabulary:  144 
Vocabularies  in  systea: 

Word  length  liait:  2  sec  «/  ISO  as  gap  between  words 
Built  in  syntaxing:  application  dependent 
Response  tiae:  250-300  as 

Niniaua  tiae  between  utterances:  ISO  as 
Teaplates  updated  continuously:  yes 

Coapatibility  of  Systea 

Systea  coapatibility:  any  coaputer  with  RS-232  port  or  B  bit  parallel  port.  Can  stand  alone. 
Languages  Supported:  Macro  coaaands,  6S02  asseable  language,  any  resident  language  for  host  systea 
Prograaaing  required:  not  required 

Microphone  /  Telephone  information 

Telephone  access:  yes  with  additional  design 
Recoaaended  aicrophone  and/or  jack  type: 
none  recoaaended 

Testing  of  ASR 

Independent  Tests:  none 
Tests  in  noise:  not  defined 

Existing  Applications:  Telephone,  Reaote  equipaent  management,  toys 

Price  and  size  information 

Price:  8250  with  discounts  for  aultiple  purchase 
Size  of  systea:  5  in  x  5  in;  <  one  pound 

Customer  Support:  yes 


Product  naae,  contact  and  address: 


Philip  T.  Mclaughlin 
Audopilot 

19  Antoine  Court  Runington,  New  York  11743 
516-351-4862 

Description  of  Product  Capabilities: 

Speaker  dependent  or  independent? 

Type  of  speech: 

Method  of  speech  recognition: 

Training  Method: 

Vocabulary  Limitations 

Number  of  words  in  active  vocabulary: 
Vocabularies  in  systea: 

Word  length  liait: 

Built  in  syntaxing: 

Response  tiae: 

Miniaua  tiae  between  utterances: 

Teaplates  updated  continuously: 

Coapatibility  of  System 

Systea  coapatibility: 

Languages  Supported: 

Prograaaing  required: 

Microphone  /  Telephone  inforaation 

Telephone  access: 

Recoaaended  aicrophone  and/or  jack  type: 


Testing  of  ASR 

Independent  Tests: 

Tests  in  noise: 

Existing  Applications: 

Price  and  size  inforaation 

Price: 

Size  of  systea: 

Custoaer  Support: 


Product  naae,  contact  and  address: 


ft 


! 

r 

{ 


I 


C; 


v 


'A 

£ 

«•% 


r5 


n: 

I 


ft 


.V 


Call talk  DVIO  Model  100 
Hr.  J.  Levenberg 
Calltalk  LTD 

Haaasger  56  Tel-Aviv,  Israel  67214 


Description  of  Product  Capabilities: 

Speaker  dependent  or  independent?  dependent 
Type  of  speech:  continuous  speech 
Method  of  speech  recognition:  templates 
Training  Method: 

Vocabulary  Liaitations 

Suaber  of  words  in  active  vocabulary:  500  words 
Vocabularies  in  systea: 

Word  length  liait: 

Built  in  syntaxing: 

Response  tine:  less  than  400  as 

Miniaua  tiae  between  utterances: 

Teaplates  updated  continuously: 

Coapatibility  of  Systea 

Systea  coapatibility: 

Languages  Supported: 

Prograaaing  required: 

Microphone  /  Telephone  inforaation 

Telephone  access: 

Recoaaended  aicrophone  and/or  jack  type: 


Testing  of  ASR 

Independent  Teats: 

Tests  in  noise: 

Existing  Applications: 

Price  and  size  inforaation 

Price: 

Size  of  systea:  17x6.5x22.5  55.51bs 

Custoaer  Support:  yes 


Product  naae,  contact  and  address: 


Mr.  Barry  Cohen 
CE  Electronics 

481  Eighth  Avenue  Suite  726  He*  York,  Rev  York  10001 


Description  of  Product  Capabilities: 

Speaker  dependent  or  independent? 

Type  of  speech: 

Method  of  speech  recognition: 

Training  Method: 

Vocabulary  Limitations 

Ruaber  of  words  in  active  vocabulary: 
Vocabularies  in  systea: 

Word  length  liait: 

Built  in  syntaxing: 

Response  tiae: 

Minima  tiae  between  utterances: 
Teaplates  updated  continuously: 

Coapatibility  of  Systea 

Systea  coapatibility: 

Languages  Supported: 

Prograaaing  required: 

Microphone  /  Telephone  inforaation 

Telephone  access: 

Becoaaended  aicrophone  and/or  jack  type: 


Testing  of  ASR 

Independent  Tests: 

Tests  in  noise: 

Existing  Applications; 

Price  and  size  inforaation 

Price: 

Size  of  systea: 

Custoaer  Support: 


Product  name,  contact  and  address: 


Voicescribe  1000 
Dr.  Janet  Baker 
Dragon  Systens,  Inc. 

SS  Chapel  Street  Heston,  HA.  02158 
617-965-5200 

Description  of  Product  Capabilities: 

Speaker  dependent  or  independent?  dependent 
Type  of  speech:  isolated 
Method  of  speech  recognition:  template 
Training  Method:  train  each  word 

Vocabulary  Limitations 

Humber  of  words  in  active  vocabulary:  1000 
Vocabularies  in  systea:  n/a 
Vord  length  liait: 

Built  in  syn taxing: 

Response  tiae:  near  real  tine 

Hiniaua  tiae  between  utterances: 

Templates  updated  continuously: 

Compatibility  of  Systea 

Systea  compatibility:  IBM  PC/XT  or  AT 
Languages  Supported: 

Prograaaing  required: 

Microphone  /  Telephone  inforaation 

Telephone  access: 

Recoaaended  microphone  and/or  jack  type: 


Testing  of  ASR 

Independent  Tests: 

Tests  in  noise: 

Existing  Applications: 

Price  and  size  inforaation 

Price:  8200  ainiaua  1,000  units 
Size  of  systea: 

Custoaer  Support: 


Product  name,  contact  and  address: 


9oicescri.be  *  20000 
Dr.  Janet  Baker 
Dragon  Systems,  Inc. 

55  Chapel  Street  Hewton,  BA.  02158 
617-965-5200 

Description  of  Product  Capabilities: 

Speaker  dependent  or  independent?  independent 
Type  of  speech:  isolated 
Method  of  speech  recognition:  phonetic 
Training  Method:  30  ainutes 

Vocabulary  Limitations 

Ruaber  of  words  in  active  vocabulary:  20,000 
Vocabularies  in  systea:  n/a 
ford  length  liait: 

Built  in  syntaxing: 

Response  tiae: 

Hiniaua  tiae  between  utterances: 

Teaplates  updated  continuously: 

Coapatibility  of  Systea 

Systea  compatibility:  IBM  PC  XT  or  AT 
Languages  Supported: 

Programming  required: 

Microphone  /  Telephone  information 

Telephone  access: 

Recoaaended  microphone  and/or  jack  type: 


Testing  of  ASR 

Independent  Tests: 

Testa  in  noise: 

Existing  Applications: 

Price  and  size  inforaation 

Price:  8500  minimum  1,000  units 
Size  of  systea: 

Customer  Support: 


56 


Product  naae,  contact  and  address: 


Hr.  Yasuo  Sato 
Fujitsu,  Ltd. 

1015  Kaai-Odanaka  Hakakara-ku,  Kawasaki  211  Japan 


Description  of  Product  Capabilities: 

Speaker  dependent  or  independent? 

Type  of  speech: 

Hethod  of  speech  recognition: 

Training  Hethod: 

Vocabulary  Lieitations 

Nuaber  of  words  in  active  vocabulary: 
Vocabularies  in  systee: 

Vord  length  liait: 

Built  in  syntaiing: 

Response  tiae: 

Hinieue  tiae  between  utterances: 
Teaplates  updated  continuously: 

Coapatibility  of  Systea 

Systea  coapatibility: 

Languages  Supported: 

Prograaaing  required: 

Hicrophone  /  Telephone  inforaation 

Telephone  access: 

Recoaaended  aicrophone  and/or  jack  type: 


Testing  of  ASR 

Independent  Testa: 

Tests  in  noise: 

Existing  Applications: 

Price  and  sire  inforaation 

Price: 

Size  of  svstea: 

Custoaer  Support: 


57 


Product  naie,  contact  and  address: 


Host  activit?  is  I8AD  &  CRAD  in  support  of  Defense  Departsent  -  no  product 
Dr.  John  R.  Daaoulakis 
Gould  Electronics 

40  Gould  Center,  Bolling  Headovs,  Ill.  60008 
312-640-4400 

Description  of  Product  Capabilities: 

Speaker  dependent  or  independent?  dependent 

Type  of  speech:  isolated  or  connected 

Hethod  of  speech  recognition:  teaplate 

Training  Hethod:  1  to  5  tiaes  inserting  individual  words 

Vocabulary  Liaitations 

Huaber  of  words  in  active  vocabulary:  256 

Vocabularies  in  systea:  256 

lord  length  liait:  ainiaua  word  length  0.1  sec. 

Built  in  syn tawing:  none 

Response  tiae:  200  as  at  low  noise;  500  as  at  0  dB  SRB 

Hiniaua  tiae  between  utterances:  200  as 

Teaplates  updated  continuously:  yes,  environaentally  adaptive 

Coapatibility  of  Systea 

Systea  coapatibility:  Special  purpose  stand  alone;  operational  on  VAX  11/780 
Languages  Supported:  Fortran,  C,  Pascal 
Prograaaing  required:  none 

Hicrophone  /  Telephone  inforaation 

Telephone  access:  not  tested  yet 
Recoaaended  aicrophone  and/or  jack  type: 
flexible,  ainiaua  telephone  bandwidth 

Testing  of  ASR 

Independent  Tests:  none 

Tests  in  noise:  aany  test  coapleted  in  noise 

Existing  Applications:  experiaental  and  evaluation  only  at  this  tiae 

Price  and  size  inforaation 

Price:  Quotation 

Size  of  systea:  .35  ft  cubed 

Custoaer  Support:  custoaized  products;  support  negotiated  in  cont 


58 


x. 

L. 


vj 


v, 


Product  name,  contact  and  address: 


Hr.  Akira  Ichikawa 

Hitachi,  Ltd.  „„  , 

1-280  Higashi-Eoigakubo  Kokubunji,  Tokyo  185,  Japan 


i* 


-\ 


■f 

V 

V 


Description  of  Product  Capabilities: 

Speaker  dependent  or  independent? 

Type  of  speech: 

Hethod  of  speech  recognition: 

Training  Method: 

Vocabulary  Limitations 

Humber  of  words  in  active  vocabulary: 
Vocabularies  in  system: 

Word  length  limit: 

Built  in  syntaxing: 

Response  time: 

Minimum  time  between  utterances: 
Templates  updated  continuously: 

Compatibility  of  System 

System  compatibility: 

Languages  Supported: 

Programming  required: 

Microphone  /  Telephone  information 

Telephone  access: 

Recommended  microphone  and/or  jack  type: 


Testing  of  ASR 

Independent  Tests: 

Tests  in  noise: 

Existing  Applications: 

Price  and  size  information 

Price: 

Size  of  system: 

Customer  Support: 


r, : 


A 


59 


V 

*3 

l 


»■> 


v. 

jv 


y. 

V. 

V. 


V 

i 


J 


s' 

V 


Product  naae,  contact  and  address: 


Voice  Coaaunication  Adapter 

Mr.  Fred  McNeese 

IBM 

IBM  Entry  Systees  Division  Boca  Raton,  Florida  33432 


Description  of  Product  Capabilities: 

Speaker  dependent  or  independent?  dependent 

Type  of  speech:  discrete 

Method  of  speech  recognition:  teiplate 

Training  Method:  user  defined,  4  utterances  recoiaended 

Vocabulary  liiitation3 

Nuaber  of  words  in  active  vocabulary:  64 

Vocabularies  in  systea:  up  to  5 

ford  length  liait:  2  seconds 

Built  in  syntaxing:  user  defined 

Response  tiae:  real  tiie 

Miniaua  tiae  between  utterances:  'brief  pause' 

Teaplates  updated  continuously:  no 

Coapatibility  of  Systea 

Systea  coapatibility:  IBM  PC 

Languages  Supported:  has  transparent  keyboard 

Prograaaing  required:  none  required 

Microphone  /  Telephone  inforaation 

Telephone  access :  yes 

Recoaaended  aicrophone  and/or  jack  type: 

high  iapedance  with  2.5aa  connector 

Testing  of  ASR 

Independent  Tests:  yes 
Tests  in  noise:  yes 

Existing  Applications: 

Price  and  size  inforaation 

Price: 

Size  of  systea:  board 

Custoaer  Support:  ye3 


S* 


60 


Product  na*e,  contact  and  address: 


iSBC  570 
Dan  Fink 
Intel  Corp. 

3065  Boners  Avenue  Santa  Clara,  CA.  95051 
408-987-8080 

Description  of  Product  Capabilities: 

Speaker  dependent  or  independent?  dependent 

Type  of  speech:  isolated 

Method  of  speech  recognition:  teaplate 

Training  Method:  three  training  passes  suggested 

Vocabulary  Limitations 

Number  of  words  in  active  vocabulary:  200 

Vocabularies  in  system:  n/a 

Vord  length  limit :  up  to  2  seconds 

Built  in  syntaxing:  user  defined 

Response  tiae:  real  tiae 

Miniaua  tiae  between  utterances:  varied,  user  defined 

Templates  updated  continuously:  yes 

Coapatibility  of  System 

System  compatibility:  Multibus  channel,  serial  channel  and  local  channel 
Languages  Supported:  C 

Programming  required:  speech  transaction  files 

Microphone  /  Telephone  information 

Telephone  access:  no 

Recommended  microphone  and/or  jack  type: 

female  jack  for  Snure  SK-10  microphone 

Testing  of  ASR 

Independent  Tests:  yes 
Tests  in  noise:  yes 

Existing  Applications:  yes 

Price  and  3ize  information 

Price:  unknown 

Size  of  system:  6.5x17x22  (60  lbs.) 

Customer  Support:  yes 


Product  naae,  contact  and  address: 


Vocal ink  S4000 
Hr.  Brundage 

Interstate  Voice  Products 

1849  Vest  Sequoia  Ave  Orange,  CA.  92668 

714-937-9010 

Description  of  Product  Capabilities: 

Speaker  dependent  or  independent?  dependent 
Type  of  speech:  continuous 
Method  of  speech  recognition:  teeplate 
Training  Method:  systee  controlled 

Vocabulary  Liaitations 

Sueber  of  words  in  active  vocabulary:  100  words 

Vocabularies  in  systee:  aultiple 

Vord  length  liait:  .1  to  2.0  seconds  15  chars/word 

Built  in  syntaxing:  yes 

Response  tiae:  <  300  as 

Miniaua  tiae  between  utterances:  n/a 

Teaplates  updated  continuously:  no 

Coapatibility  of  Systee 

Systee  coapatibility:  PC  DOS  or  MS  DOS 
Languages  Supported:  all  supported  by  DOS 

Prograaaing  required:  graaaar  and  translation  files  defined  by  user 

Microphone  /  Telephone  inforaation 

Telephone  access:  no 

Recoaaended  aicrophone  and/or  jack  type: 

headset  or  wireless 

Testing  of  ASR 


Independent  Tests:  yes 
Tests  in  noise:  yes 
Existing  Applications: 

Price  and  sire  inforaation 


yes 


Price: 

Size  of  3ystea:  17x4x12 

Cu3toaer  Support:  yea 


15  lbs. 


Product  uii,  contact  ud  addrtaa: 


Rultibua  CSI 
li chart  C.  Sadler 

ITT  Oafeaae  Coaauaicationa  Dmaion 

492  Kim  load  fat lay.  Im  Jaraay  07119-36% 

201-284-4234 

Deacnptioa  of  Product  Capability*: 

Spanker  dapeadaat  or  ladapasdant?  dependant 

Tn«  of  apaach:  both 

Method  of  apaach  recognition:  teaplate 

Training  Method:  ayatea  defined  initially,  but  uaer  defined  I  of  utterance* -eord 
Vocabulary  Liaitationa 

faaber  of  eords  in  active  vocabulary:  40  active  teapiates  t  20C  to  300  vord* 
Vocabulariaa  m  aystaa:  3  but  n  expandable  to  30 
lord  length  liait:  n/a 

built  in  ayntaxing:  oner  prograaaabie  ayntaa  60  nodes  290  vorda/node 
Reaponae  tiae:  <25  aac 

Mimaua  tine  beteaen  utterance*  n/a 
Teaplate*  updated  coatinuoualy  no 

'oapatibility  of  Syatea 

3yste*  coanatibility:  Venix/86  OS 
language*  Supported  '  and  aaaeat.y 
P-ograeaing  required  syatea  defined  for  graaaar 

hi" opnone  Te.epnone  inforaatior 

Te.epbone  accea*  nc 

Recoeaended  aicrophone  and  or  ,ac»  'ype 

app..ca*ior  dependent 

•eat  ng  of  ASS 

tdepe>-aer*  ’eats  ve* 

*es'i  . t  nc.se  vea 

Er.s’.ng  App. .oa* .ons  yes 

anc  t.i*  .nforaa'.jr 

Pv  '  e  S1"  W 

'.:e  f  jvs’ee  16'/.  1  ..pee  <t  .It 
‘  „s*  ne1  : upper1  res 


30 


•  “ 

•V 


Product  mu,  contact  and  addrnna: 


lurzveil  VoicnSystnne 
Rr.  Bod  Joined 

CurtMil  Applied  Intelligence,  Inc. 

411  Vavetly  Oeka  Road  Valthae,  Ra.  02154 

617-M3-5151 


i 


i 

% 


'  N 


Oaocnptioa  of  Product  Capabilities: 

Speaker  dependent  or  ladapoodaat?  dependent,  limited  independent 
Type  of  apoach:  uolatad 

Rat  nod  of  apaacti  recognition:  template  plua  othar  proprietary  algorithm* 

Training  Retbod:  ona  to  thraa  tiaaa  for  each  uttaranca 

Vocabulary  Lunation* 

luabar  of  oorda  in  acti*a  vocabulary:  1000 

Vocabularies  in  ayataa:  aultipla 

Vord  langth  halt:  up  to  several  seconds 

luilt  m  syntaxing:  optional,  user  developed 

Response  tiaa:  <.5  sac 

tlimeua  tiaa  batvaen  utterances:  60-180  an 

Template*  updated  continuously:  no 

Coapatibility  of  Systnn 

Systaa  conoatibility:  IBM  PC  conpatibla,  connects  to  ASCII  6  3270  hosts  s/o  nodification  to  host  s/s 
languages  Supported:  fVS  libraries  vritten  in  C  can  be  linked  s /  objects  produced  by  other  lanquaqes 
Proqraeninq  required:  none  required 

Ricrophone  /  Telephone  information 

Telnphone  access:  liaitsd 
Reconennded  aicrophone  and/or  jack  type: 

1  pin  DIR  connector,  headset  &  handset  available 

Teatinq  of  ASR 

Independent  Testa,  yea 

Tests  in  noise:  reliable  in  hiqh  continuous  noise  environaents 

Existing  Applications:  yes 

Price  and  size  inforaation 

Price:  tVS  AA  16500,  volune  discounts  available 

Size  of  ivstee:  14x6.5x8  18  lbs 

Cast oner  Support:  yes 


V 

.y 


64 


v** 


Mm 1 


Product  MM)  contact  and  addresa: 


Volet-Macro* 

Ellon  L.  Clark 
Nicrophoaics 

25- 37th  St.  I.E.  Suit*  B  Auburn,  Va.  98002 
208-939-2321  800-325-9206 

Description  of  Product  Capabilitlat: 

Spoaktr  dependent  or  indtpandaot?  dependent 
TfP*  of  speech:  discrete 
Method  of  speech  recognition:  teeplatt 
Training  Method:  1  pass 

Vocabulary  Imitations 

Ihiaber  of  eorda  in  active  vocabulary:  128 

Vocabularies  in  systM: 

lord  length  lmit:  2  seconds 

Built  in  ayataxlng:  no 

BesponM  tme: 

Minmua  tme  beteeen  utterances: 

TeeplatM  updated  continuously: 

Coepatibility  of  Systea 

SystM  coepatibility:  IBM  PC, XT, AT 
Languages  Supported:  DOS  coapatiblc 
PrograMing  required:  DOS  coepatibla 

Microphone  /  Telephone  intonation 

Telephone  access: 

Becoeeended  aicrophone  and/or  jack  type: 


Testing  of  ASB 

Independent  Testa: 

Tests  in  noise:  yea 

Existing  Applications: 

Price  and  size  intonation 

Price: 

Sue  of  systM:  board 

CustoMr  Support: 


65 


V 

L 


S 


Product  name,  contact  and  address: 


Hr.  Jun  Oyaaada 
DEC  America  Inc. 

8  Old  Sod  Fan  load  Melville,  Dee  York  11747 
516-753-7000 


9 
•  . 


Description  of  Product  Capabilities: 

Speaker  dependent  or  independent?  dependent,  lieitsd  independent 
Type  of  speech:  isolaUd  and  connected 
Method  of  speech  recognition:  teeplate 
Training  Method: 

Vocabulary  Imitations 

Ruaber  of  eords  in  active  vocabulary:  500  tords 
Vocabularies  in  systee: 
lord  length  liait: 

Built  in  syntaxing: 

Response  tiae: 

Miniaua  tiae  betseen  utterances: 

Teeplates  updated  continuously: 

Coepatibility  of  Systee 

Systee  coepatibility: 

Languages  Supported : 

Prograaeing  required: 

Microphone  /  Telephone  information 

Telephone  access: 

lecoeaended  aicrophone  and/or  jack  type: 


Testing  of  ASR 

Independent  Tests: 

Tests  in  noise:  80  -  85  dB 

Existing  Applications: 

Price  and  size  information 

Price:  *598  -  19,995 
Size  of  system: 

Customer  Support:  yes 


66 


Product  hub,  contact  and  addraaa: 


Auatln  Bordeaux 
DA  Loqlcoa  KAO  Aaaoclataa 

P.0.  Boa  9A95  4*40  Adalralty  lay  Harm*  Dal  Kay,  CA. 

213-B22-1715 


90295 


Daacriptioa  of  Product  Capabilitiaa: 

Spaakar  dapaadaat  or  ladapaadaat? 

Typ«  of  apaach: 

Nathod  of  apaach  racognition: 

Training  Nathod: 

Vocabulary  Liaitatlooa 

Ruabar  of  aorda  m  activt  vocabulary: 
Vocabulariaa  in  ayataa: 
lord  laagth  liait: 

Built  In  ayntaxlng: 

Raapoaaa  tiaa: 

Rinlaua  tiaa  bataaaa  uttarancaa: 
Taaplataa  updatad  contiauoualy: 

Coapatlbillty  of  Syataa 

Syataa  compatibility: 

Languagaa  Supportad: 

Prograaaing  raguirad: 

Nicrophoaa  /  Talaphoaa  mforaatioa 

Talapboaa  accaaa: 

Kacoaatndad  aicrophona  and/or  jack  typa: 


Taating  of  ASK 

Indapandant  Taata: 

Taata  in  noiaa: 

Exiating  Applicationa: 

Pnca  and  ana  inforaation 

Pnca: 

Sita  of  ayataa: 

Cuatoaar  Support: 


Product  mm,  contact  and  addroM: 


Coretecha  ft?  3  Speech  Terainal 

lapn  Laffitta 

Scott  InatroMta  Corp. 

1111  111 loo  Spriaqa  Driva  Deatoe,  Toiaa  7(201 
117-317-9514 

Deecnption  of  Product  Capabllitiee: 

Speaker  dapeedeat  or  lndapaadaat?  both 
of  fpaacfe:  coaaactad  and  diacrata 

flothod  of  apaech  recognition:  teaplate  'unique  representation  of  apofcaa  aord) 
Trawunq  Rnthod:  1  paaa 

Vocabulary  Liaitationa 

•uabar  of  aorda  m  acti*a  vocabulary:  200  1/2  aac  aorda  aad  100  recording 

Vocabulanaa  in  ayatoa:  1 

ford  langth  halt:  3  aacoada  diacrata  I  aaconda  coaaactad 
luilt  in  ayatailaq:  yaa 

laapoaaa  tiaa:  .25  aacoada  aoftaara  aaioctabla 

llmaua  tiM  botaaaa  uttaraacaa:  .25  aacoada  aoftaara  aaioctabla 

Taaplitaa  updated  coatiaooualy:  no 

Coepntibility  of  SyatM 

Syataa  coapatibility:  any  coaputar  a/  IS232  coaaunicatioaa 
laequagea  Supported :  any 
PrograMing  retired:  noaa  repaired 

kicrophone  /  Tolepbone  inforaatioa 

Tnlephonn  accaaa:  yae 

lacoaaaadad  aicropboae  and/or  jack  type: 

liroaa  KflO-78-4 

Taatinp  of  (SI 

Independent  Teata:  yea,  reaulted  in  purchaae  o(  Scott  VET  1 

Teata  in  noiae:  up  to  110  db  noiae 

Cutting  (pplicationa  0004  data  gathering 

Price  and  ure  inforaation 

Price:  *1995. 00  hat  .  VU  and  diatnbutor  pricing  available 

Site  of  ivatae:  deektop  I7.(el(. 3*4.5  1(  'be;  rack  aount  19. 0il(  3*5.22 

Cuatoeer  support:  yea 


Product  mm,  contact  and  addraaa: 


SSl't  Pboaatic  Engina 

Laonard  L.  Backua/Doana  J.  Hurchiaoa  (81S)  881-0885 
Spaoch  Syatan  Inc. 

18356  Omrd  StrMt  Tarzana,  California  91356 
617-639-2360 

DMcriptioo  of  Product  Capabilitiaa: 

SpMltur  dapandtat  or  iadapandant?  daptndont 

Typo  of  apaach:  coatinuoua 

Bathod  of  tpMch  racognition:  phono tic 

Training  Method:  20  ainutM,  optional  (incrtMM  accuracy) 

Vocabulary  Liaitationa 

Ruabar  of  aorda  in  activa  vocabulary:  5000 
Vocabulanaa  in  ayataa:  all  can  ba  accaoaad 
Vord  length  liait:  n/a 
Built  in  ayntaxing:  yaa 

laaponaa  tiaa:  phonetica  prod uc ad  in  raal  tiaa 

fllniaua  tiM  bataaaa  uttarancaa:  n/a 

Taaplataa  updatad  continuoualy:  no,  taaplataa  not  uaad 

Coapatibllity  of  Syataa 

Syataa  coapatibllity:  Pbonatic  procaaa  aoftaara  in  C.  Currantly  on  VAX  and  SUM  ayataaa 
Languagaa  Supported : 

PrograMing  raquirad:  Uaar  inputa  to  ayntai  and  dictionary  utilizing  SSI  toola 
Ricropbon  /  Telephone  inforaation 

Telephone  accaaa: 

lacoaaandad  aicropbona  and/or  jack  typo: 
propnotary  handaet/ telephone  typa 

Taating  of  AS! 

I ndapandant  Taati: 

Taata  in  noiee-  no 

Enating  Application:  devnlopneet  application  includa  coaaand  i  control,  liaitod  dictation,  AI 

Prica  and  ana  inforaation 

Prica:  iapanda  on  configuration  of  davalopaant  and  ayataa 

Sua  of  ayataa: 

luatoner  Support:  yu,  coat  fill  ba  ainiaal 


3 

« 

$ 


Product  mm,  contact  and  addreaa: 


TI -Speech  Development  Syatea 
Hr.  Doug  PalMr 

Texaa  Inatrueenti  Inc.  H/S  2081 
P.0.  Boa  2908  Auatin,  Texan  78769 
512-250-6005 


r, 

V, 

V 

r 

V 

r 

V 


« 


a 


Deacription  of  Product  Capabilitiaa: 

Speaker  dependent  or  indapandant?  dependant 
Type  of  apeech:  both 
Hethod  of  apaach  recognition:  taaplate 
Training  Hatbod:  ayatea  defined 

Vocabulary  Liaitationa 

Huabar  of  aorda  in  active  vocabulary:  50  vorda 
Vocabulariea  in  ayataa:  1000  vorda  total 
lord  length  liait:  n/a 
Built  in  ayntaxing:  uaar  defined 
RaaponM  tine:  real  tin* 

Hiniaua  tiM  batvaan  utterancea:  a/a 
Teaplatea  updated  continuoualy:  no 

Coapatibility  of  Syataa 

Syataa  coapatibility:  IBM  PC  and  TI 

Languagaa  Supported:  HS-Baaic,  HS-Pucal,  Lattice  C,  IQ  Liap,  Conpiled  Baaic 
PrograMing  required:  graaaar  atructurea 

Hicropbona  /  Telephone  infornation 

Telephone  acceaa:  available 
Becoaaendad  aicrophone  and/or  jack  type: 

1/4  inch  Jack  -  hand  held  aika 

Texting  of  ASB 

Independent  Teata:  yea 
Teata  in  noiae:  yea 

Exiatinq  Application: 

Price  and  aite  infornation 

Price:  11155 

Sim  of  avatee:  board 

Cuatoeer  Support: 


* 


A* 


7Q 


Product  naie,  contact  and  address: 


TOSVOICE 

Dr.  Sadakazu  Vatanabe 
Toshiba  Corp. 

l.Konukai  Toshibacho,Saiwai-Ku, Kawasaki-City, Kanagawa, 210, Japan 
Kawasaki  044-511-2111 

Description  of  Product  Capabilities: 

Speaker  dependent  or  independent?  independent 
Type  of  speech:  discrete 
Method  of  speech  recognition:  both  are  used 
Training  Method:  n/a 

Vocabulary  Limitations 

Muaber  of  words  in  active  vocabulary:  64 

Vocabularies  in  systei:  64 

Vord  length  liiit:  4  sec 

Built  in  synt axing:  optional 

Kesponse  tiie:  200  nsec 

Hiniaui  tile  betieen  utterances:  i  sec 

Templates  updated  continuously:  n/a 

Coapatibility  of  Systei 

Systea  coapatibility:  DOS;  PL-40 
Languages  Supported:  PL-40,  Fortran 
Programing  required:  none 

Microphone  /  Telephone  information 

Telephone  access:  yes 

Kecomaended  microphone  and/or  jack  type: 

telephone;  Shura  SH12;  canon  connector 

Testing  of  ASK 

Independent  Teats:  yes 
Teats  in  noise:  75  dB(A) 

Existing  Applications:  yes 

Price  and  size  inforaation 

Price:  unknown 

Size  of  svstei:  500x900x900  aa;  50  Kg 

Customer  support:  no 


Product  naae,  contact  and  address: 


Verbex  Series  4000 
Hr.  Chris  Seelbach 
Verbex/ Voice  Industries  Corp. 

10  Hadison  Ave.  ,  Horristown,  He*  Jersey  07960 
201-267-7507 

Description  of  Product  Capabilities: 

Speaker  dependent  or  independent?  dependent 
Type  of  speech:  continuous 
Hethod  of  speech  recognition:  template 
Training  Hethod:  systea  defined 

Vocabulary  Limitations 

Humber  of  words  in  active  vocabulary:  100  words 

Vocabularies  in  systea:  multiple 

lord  length  liait:  15  characters 

Built  in  syntaxing:  yes 

Response  time:  <300  as 

Hiniaua  tiae  between  utterances:  n/a 

Teaplates  updated  continuously:  no 

Coapatibility  of  Systea 

Systea  coapatibility:  IBH  PC  coapatible 
Languages  Supported:  all  supported  by  DOS 
Programming  required:  grammars  and  translation  tables 

Hicrophone  /  Telephone  information 

Telephone  access:  no 

Recoaaended  aicrophone  and/or  jack  type: 

Shure  VR-  230 

Testing  of  ASR 

Independent  Tests:  yes 
Tests  in  noise:  yes 

Existing  Applications:  yes 

Price  and  site  information 

Price:  unknown 

Site  of  systea:  17i4xl2  15  lbs 

Custoaer  Support:  yes 


Product  naae,  contact  and  address: 


VCS  Technology 
Dr.  R.E.  Helas 
Voice  Control  Systeis 

16610  Dallas  Parkway,  Dallas,  Texas  75248 
214-248-8244 

Description  of  Product  Capabilities: 

Speaker  dependent  or  independent?  independent 
Type  of  speech:  discrete 
Method  of  speech  recognition:  phonetic 
Training  Method;  n/a 

Vocabulary  Limitations 

Muiber  of  words  in  active  vocabulary:  20 

Vocabularies  in  systea:  1  kbyte/vocabulary  words 

Word  length  liait:  1.5  seconds 

Built  in  syntaxing:  optional 

Response  tiie:  250  nsec 

Minima  tiae  between  utterances:  n/a 

Teaplates  updated  continuously:  n/a 

Coapatibility  of  Systea 

Systea  coapatibility:  stand  alone 
Languages  Supported:  application  specific 
Prograaaing  required:  application  specific 

Microphone  /  Telephone  inforaation 

Telephone  access :  yes 

Recoaaended  ncrophone  and/or  jack  type: 

application-specific 

Testing  of  ASR 

Independent  Tests:  unknown 

Tests  in  noi3e:  yes-  specific  versions  have  been  developed 

Existing  Applications:  yes 

Price  and  sine  inforaation 

Price:  cost  to  produce  is  app.  3100 
Site  of  systea:  35  square  inches 

Custoaer  Support:  yes 


Product  naae,  contact  and  address: 


VPC  2100 

Hr.  Bruce  8yon 

Votan 

4487  Technology  Drive,  Freesont,  CA.  94538-6343 
41 5 -490-7600 

Description  of  Product  Capabilities: 

Speaker  dependent  or  independent?  dependent 

Type  of  speech:  both 

Hethod  of  speech  recognition:  template 

Training  Hethod:  discrete  *ord a;  2  utterances  recoseended;  can  extract  continuous  phrases 
Vocabulary  Lmtations 

Nuaber  of  «ords  in  active  vocabulary:  80  (sore  vith  fever  training  passes  or  optional  expantion  vocaoula 
Vocabularies  in  systes:  n/a 
Word  length  lint:  n/a 

Built  in  syntaxing:  no,  available  through  vocabulary  subsets 

Response  tise:  real  tise 

Hinisus  tise  between  utterances:  n/a 

Templates  updated  continuously:  no 

Cospatibility  of  Systes 

Systes  cospatibility:  IBM  PC's  and  cospatibles 
Languages  Supported:  C  86 
Prograssing  reguired:  none 

Hicrophone  /  Telephone  intonation 

Telephone  access:  yes 

Recoseended  sicrophone  and/or  jack  type:  gooseneck  and  handheld 

Testing  of  ASS 

1  independent  Tests :  yea 
Tests  in  noise:  yes 

Existing  Applications:  yes 

Price  and  site  inforsation 

Price: 

Sue  of  systes:  board 

Custoser  Support:  yes 


%  % 


A  ft-Ti Ll  U 


Product  nase,  contact  and  address: 


Series  100  Voice  Data  Collection  Systes 
V.A.  Hardister 

Yestmghouse  Electric  Ccrooraticn 

Southern  Regional  Office  one  Anoilsood  Place,  Asheville,  S.C,  28804 
'04-645-422. 


Description  of  Product  Capabilities : 


Speaker  dependent  or  independent?  dependent 
*ype  of  speech:  continuous 
•e'hod  if  speech  recognition:  testlate 
Trair.m;  le'hod' 


V'car.larv  1. 


Viler  of  seeds  active  vxaou.a:?-  20C 
Vocabu.anes  in  systes. 

Vcrd  length  l.sit  n  a 
su.lt  in  svntaxmq:  yes 
5esccr.se  *.se  real  rise 

Pir.sus  ‘ite  be* seen  utterances-  r  a 
Tesp.ates  ucdated  cor.tinucus.v  no 


.  ^stes  -  .scat .1 ...  tv 
1  i-i^aies  :  ipportec  r  a 
-*  'd’aes.iQ  required  systes  de: .re: 


"i-rorhone  .e.ep-cre  irforsat.rr. 


p.echone  a'ress 

•e,“o#sendec  ».cr;.p*:ne  anc  ::  ;a:i  type: 


•decerje’-'  ->s*s 

"•>  ••  .  '  -  '  .Sr 

•.  <  ?t . ■ :  4cr . .  at . 


.  -«■  «r.-t  -  ,:e  ir.f  rtatirr 


-  ;  s  v  s '  es 
'.s*'se'  "  .i ;  - ' 


„■* .  <. 
“S 


V',  .'.  /. 


\  \  A  . 


Product  naae,  coo tact  and  address: 


Seraphine 

Mr.  Herve  Couturier 
XCOM 

BP  29  Montbonnot  Saint  Martin,  Saint-Isaier,  France  38330 
76-52-00-46 

Description  of  Product  Capabilities: 

Speaker  dependent  or  independent?  both 

Type  of  speech:  discrete  and  connected 

Method  of  speech  recognition:  templates  for  each  speaker 

Training  Method:  single  pass  for  individual  vords 

Vocabulary  Liaitations 

Nuaber  of  vords  in  active  vocabulary:  100  vords 
Vocabularies  in  systea:  100->200  vords 
Vord  length  Unit:  6  seconds;  up  to  7  vords 
Built  in  syntaxing:  yes 

Response  tiae:  size  and  syntax  dependent;  1  sec  for  0  to  999  recognition 

Hiniaua  tiae  betveen  utterances:  1  sec 
Teaplates  updated  continuously:  no 

Coapatibility  of  Systea 

Systea  coapatibility:  stand  alone  3ystea  -  Multibus  and  SS232C 
Languages  Supported:  all 

Prograaaing  required:  no  prograaaing  required  for  test 

Microphone  /  Telephone  inforaation 

Telephone  access:  under  study 
Recoaaended  aicrophone  and/or  jack  type: 

Shure  SH10 

Testing  of  ASR 

Independent  Tests:  no 
Tests  in  noise:  no 

Existing  Applications:  yes 

Price  and  size  inforaation 

Price:  33,000 

Size  of  systea:  SBC  board  or  445x300x70  aa  cabinet 

lustoaer  Support:  yes,  free  in  France 


Appendix  III.  Connected  Speech  Vocabularies 


Vocabulary 

Words 

-  Scenario 

_2 

1 

Driver 

20. 

Again 

2 

Move 

21  . 

M-48 

3 

Out 

22. 

M-60 

4 

Sagger 

23. 

Turn 

S 

Gunner 

24. 

Rear 

6 

Cease 

25  . 

Identified 

7 

Fire 

26  . 

Sabot 

8 

Heat 

27. 

On 

9 

Tank 

28. 

Target 

10 

Steady 

29. 

M-l 

11 

Right 

30. 

Slower 

12 

Left 

31  . 

I 

13 

Coax 

32. 

Ammo 

14 

Can't 

33. 

Forward 

15 

Go 

34. 

Stop 

16 

Faster 

35. 

Watch 

17 

For 

36. 

Load 

18 

.  Re-engaging 

37. 

Any 

19 

.  Fast 

Vocabulary 

Words 

-  Scenario 

_3 

1  . 

Part 

16. 

Tool 

2. 

Number 

17. 

Is 

3  . 

Has 

18. 

Required 

4  . 

Failed 

19. 

To 

5. 

How 

20  . 

Install 

6. 

Many 

21  . 

Zero 

7. 

Of 

22. 

One 

8  . 

Are 

23  . 

Two 

9. 

I  n 

24. 

Three 

10  . 

Stock 

25  . 

Four 

11  . 

Which 

26. 

Five 

12  . 

Replaces 

27. 

Six 

13. 

What 

28  . 

Seven 

14. 

Repair 

29. 

Eight 

15  . 

Procedures 

30  . 

Nine 

78 


Appendix  IV.  ITT  Syntax  for  Scenario  2 

r{  sagger  V{  sagg 


ITT  Syntax 


driver 


gunner 


M-60 

M-48 

M-1 


sabot  fire 


steady 

identified 

steady 


c  ammo  )H 

:  out  ) 

(  forward  ] 

h(  steady  ) 

(  watch  ] 

H 

:  '"'o'  ) 

sagger 

out 

tank 


sagger 


load 

ammo 


slower 


(  cease  }-H 

:  fire  3 

(  heat  )H 

tank  } 

(  identified  }-( 

target  ] 

— (  tank  ) 

(  can’t  )-( 

load  ) 

— (  sabot  y~ 

{  fast  ] 

(  can't  )-( 

'  fire  3 

^  turn 

'  rear  3 

v  iurn  M 

right  ) 

— (  slower  ) 

(  re-engaging  )-{ 

.  any  ) 

— (  identified  )- 

{  target  ) 

go  > 

\  faster  ) 

fire  )- 

-(  faster  ) 

right  ) 

left  ) 

again  ) 

left  ) 

right  ) 

slower  )-| 

<  'Oft  ) 

■i  right  ) 

target  ") 

rear  > 

■C  tank  3 

target  ") 

rear 

<  tank  ) 

again  ") 

nn 

<  target  ) 

^  rear  )— Q  tank 

- s 

rC  tank  ) 

on  y 

<  M-60  X  tank 

Appendix  V.  Other  Connected  Syntax  (Verbex,  Interstate,  and  TI ) 


Syntax  Structure  for  Scenario  2  (except  ITT) 


First 

Second 

Driver 

Move 

Gunner 

Sagger 

Tank 

Cease 

Move 

Heat 

Coax 

Steady 

Can '  t 

Fire 

M-48 

Go 

M-60 

Can '  t 

Sabot 

Turn 

M-l 

Identified 

I 

On 

Fire 

Tank 

Ammo 

Out 

Forward 

For 

Watch 

Re-engaging 

Fourth  Word 

Fifth  Word 

Slower 

Tank 

Right 

Slower 

Faster 

Stop 

Target 

Ammo 

Tank 

Target 

M-60 

Out 

Steady 

Load 

Identified 

Sabot 

Fast 

Third 

Out 

Sagger 

Fire 

Tank 

Right 

Left 

Again 

Faster  , 

Rear  | 

Target  1 2 3 

Slower  i 

On  ; 

Steady 

Any  | 

Load  ] 

* 


1.  First  Word  ->  Second  Word  ->  Third  Word 

2.  First  Word  ->  Second  Word  ->  Third  Word  ->  Fourth  Word 

3.  First  Word  ->  Second  Word  ->  Third  Word  ->  Fourth  Word  ->  Fifth  Word 


30 


Appendix  VI.  Test  Sentences 


Sentence  List  -  Scenario 


Three  Word3 

1 .  Driver  move  out 

2.  Driver  3agger  sagger 

3.  Gunner  cease  fire 

4.  Gunner  heat  tank 

5.  Tank  steady  right 

6 .  Move  steady  left 

7.  Coax  fire  again 

8.  Can't  go  faster 

9.  M-48  can't  fire 

10. M-60  turn  rear 

11.  Tank  identified  again 

12.  Coax  on  target 

Four  Words 

13.  M-l  turn  right  slower 

14.  Move  tank  slower  right 

15.  I  can't  fire  faster 

16.  Coax  fire  on  target 

17.  Fire  on  rear  tank 

18.  Gunner  identified  target  tank 

Five  Words 

19.  Ammo  out  on  M-60  tank 

20.  Driver  move  tank  out  slower 

21.  Forward  steady  steady  steady  stop 

22.  Watch  for  sagger  load  ammo 

23.  M-l  re-engaging  any  identified  target 

24.  Gunner  can't  load  sabot  fast 


Sentence  Li3t 


Jena:  1 o  3 


Basic  Sentences 

1.  Part  # _  has  failed. 

2.  How  many  of  part  # _  are  m  stock. 

3.  Which  part  replaces  Part  •  _ . 

4.  Part  # _  has  failed  what  are  repair  procedures. 

5.  What  tool  is  required  to  install  part  *  _ . 


Increasingly  longer  sequences  of  numbers  were  used. 


Scenario  3  -  Actual  Sentences  Tested 


1.  Part  number  six  has  failed. 

2.  How  many  of  part  number  nine  are  in  stock 

3.  Which  part  replaces  part  number  two. 

4.  Part  number  four  has  failed  what  are  repair  procedures 

5.  What  tool  is  required  to  install  part  number  3evan. 

6.  Part  number  two  eight  has  failed. 

7.  How  many  of  part  number  three  nine  are  in  stock. 

8.  Which  part  replaces  part  number  seven  four. 

9.  Part  number  one  six  has  failed  what  are  repair  procedures. 

10.  What  tool  is  required  to  install  part  number  zero  five. 

11.  Part  number  seven  six  one  has  failed. 

12.  How  many  of  part  number  zero  two  four  are  in  stock. 

13.  Which  part  replaces  part  number  three  five  eight. 

14.  Part  number  nine  two  two  has  failed  what  are  repair 
procedures . 

15.  What  tool  is  required  to  install  part  number  nine  nine  one. 

16.  Part  number  six  three  two  one  has  failed. 

17.  How  many  of  part  number  four  four  six  six  are  in  stock. 

18.  Which  part  replaces  part  number  eight  seven  eight  three. 

19.  Part  number  six  six  one  one  has  failed  what  are  repair 
procedures . 

20.  What  tool  is  required  to  install  part  number  two  two  two 
eight . 

21.  Part  number  seven  eight  three  three  seven  has  failed. 

22.  How  many  of  part  number  nine  four  zero  zero  nine  are  in 
stock . 

23.  Which  part  replaces  part  number  nine  seven  seven  three  three. 

24.  Part  number  one  two  six  six  two  has  failed  what  are  repair 
procedures . 

25.  What  tool  is  required  to  install  part  number  zero  one  one 
nine  four. 


82 


Appendix  VII.  Rejection  /  Misrecognition  Matricies:  Discrete  Task 


Rejection*  -  Discrete  Scenario 


R 

I 

D  I 

L 

0 

T 

0  G 

E 

T 

E 

RUSH 

F 

A 

Recognizer 

Noise 

0 

1 

a 

2 

3 

4 

5  6 

7 

8 

9 

S 

0  P  N  I 

I 

L 

X 

Verbex 

None 

2 

2 

1 

6 

1 

1  1 

5 

2 

4 

2  2  1  12 

11 

53 

.08 

Industrial 

14 

21 

4 

19 

15 

4  22 

16 

13 

10 

9  15  25  8  25 

11 

232 

.36 

Fast  Food 

Restaurant 

8 

14 

2 

10 

6 

11  23  16  14 

5  11 

5  20  6  21 

21 

193 

.30 

IH4000 

None 

5 

4 

8 

1 

5  1 

4 

4 

1 

2  6  1  16 

10 

68 

.11 

Industrial 

15  23  10  15  12 

18  12  17  20  14  19  11  26  9  29 

23 

273 

.43 

Fast  Food 

Restaurant 

16  20 

2  15  16 

4  25  10  13 

10  14  12  24  16  30 

18 

245 

.38 

VOT  AN 

None 

0 

0.00 

Industrial 

0 

0.00 

Fast  Food 

Restaurant 

0 

0.00 

II 

None 

3 

2 

2 

1 

* 

8 

.01 

Industrial 

3 

I  1 

4 

1 

3 

13 

.02 

Fast  Food 

Restaurant 

1 

1 

3 

1 

2  3 

11 

.02 

IBB 

None 

0 

0.00 

Industrial 

3 

4 

3 

3 

3 

5  2 

4 

5 

2 

1 

3  3  3  3 

4 

51 

.08 

Fast  Food 

Restaurant 

0 

0.00 

INTEL 

None 

3 

1 

1 

1 

6 

.01 

Industrial 

1 

1 

1 

3 

.00 

Fast  Food 

Restaurant 

3 

1 

1 

1 

1 

1 

8 

.01 

INCSRB 

None 

1 

1 

3 

6  1  6 

3 

21 

.03 

Industrial 

2 

1 

1 

1 

3  1 

1 

7 

1 

5  3  6 

4 

36 

.06 

Fast  Food 

Restaurant 

6 

1 

1 

8  2 

2 

1 

5 

2  8  3 

3 

42 

.07 

KVS 

Rone 

1 

2 

1 

1 

2  1 

2 

10 

.02 

Industrial 

6 

4 

7 

1 

4 

3  21 

9 

8 

3 

9 

1  19  2  2 

8 

107 

.17 

Fast  Food 

Restaurant 

6 

7 

2  11 

17  13 

5 

3 

4 

11  1 

5 

85 

.13 

in 

None 

0 

0.00 

Industrial 

0 

0.00 

Fast  Food 

Restaurant 

0 

0.00 

v% '■  "V., 


to 

L 


Recognizer  Noise 


Nisrecogmtion  -  Discrete  Scenario 


D  !  L 

¥  0  G  E 

E  R  D  V  H  F 

0123456789S0PRT  T 


1  3  1 


Verbex  Rone  » 

Industrial  1  1 

Fast  Food 

Restaurant  1 

IN4000  None 

Industrial  4 

Fast  Food 

Restaurant  1 

VOTAN  None  2 

Industrial 
Fast  Food 

Restaurant  1  111 

TI  None  1 

Industrial 
Fast  Food 

Restaurant  1 

I3W  None  4  16  7 

Industrial  1  3  131  1 

Fast  Food 

Restaurant  1  516 

INTEL  None  1 

Industrial  1 

Fast  Food 
Restaurant 

INCSRB  None  1  1  1 

Industrial  3  4  2  23  214 

Fast  Food 

Restaurant  11  11  3 

KVS  None 

Industrial  311  221  11121 

Fast  Food 

Restaurant  14  342  51121 

ITT  None 

Industrial  1 

Fast  Food 

Restaurant  1  1 


5  1 


1  1 
2  3  2  1  4 


3  5 


3  4  2  5  1  1  2  1  1 

1 


1  .001 
2  .0 

1  .001 
0  0.0 
4  .0 

1  .001 


1  .001 
0  0.0 

1  .001 
19  .0 

10  .0 

13  .0 

1  .001 
1  .001 

0  0.0 
4  .0 

26  .0 

12  .0 
1  .001 
16  .0 

30  .0 

1  .001 
2  .0 


84 


Appendix  VIII.  Tukey  Analysis  of  Means:  Discrete  Task 


Grouping  Mean  #  of  Correct 


Recognizer 

A 

B 

C 

Mean 

N 

Uttera 

ITT 

•* 

159 . 58 

12 

.997 

VOTRAN 

* 

159.17 

12 

.  995 

INTEL 

* 

158.42 

12 

.  990 

T I 

* 

157.17 

12 

.  982 

IBM 

* 

* 

152.25 

12 

.  952 

INTERCSRB 

* 

* 

148 . 25 

12 

.  927 

KURZWEIL 

it 

139.17 

12 

.870 

VERBEX 

* 

119.67 

12 

.748 

INTERSTATE 40 00 

it 

110.75 

12 

.692 

85 


.Uk  -U  u 


Appendix  IX.  Tukey  Analysis  of  Noise  Effects:  Discrete  Task 


GROUPING 


*  CORRECT  UTTERANCES 


NOISE 

A 

B 

MEAN 

N 

#T0TAL  UTTERANCES 

NONE 

★ 

1S4. 53 

36 

.  966 

FAST  FOOD 

* 

RESTAURANT 

* 

142.00 

36 

.  888 

INDUSTRIAL 

★ 

* 

138.28 

36 

.  864 

86 


Appendix  X.  Tukey  Analysis  of  Recognizer  *  Noise:  Discrete  Task 


V' 

t 


c-: 


! 


! 

I 

I 


5 

V 


’v 

V 


» 


f 


RECOGNIZER 

NOISE 

A 

B 

ITT 

NONE 

* 

ITT 

RESTAURANT 

* 

* 

ITT 

INDUSTRIAL 

* 

* 

VOTAN 

NONE 

it 

* 

VOTAN 

RESTAURANT 

* 

INTEL 

NONE 

* 

* 

VOTAN 

INDUSTRIAL 

* 

* 

INTEL 

INDUSTRIAL 

it 

* 

INTEL 

RESTAURANT 

it 

* 

TI 

NONE 

it 

it 

TI 

RESTAURANT 

it 

* 

KURZWEIL 

NONE 

it 

it 

TI 

INDUSTRIAL 

it 

* 

IBM 

RESTAURANT 

it 

* 

IBM 

NONE 

* 

* 

INTERSTATE 

CSRB 

NONE 

it 

* 

VERBEX 

NONE 

it 

* 

INTERSTATE 

CSRB 

RESTAURANT 

it 

it 

IBM 

INDUSTRIAL 

* 

it 

INTERSTATE 

CSRB 

INDUSTRIAL 

it 

it 

INTERSTATE 

4000 

NONE 

it 

it 

KURZWEIL 

RESTAURANT 

* 

KURZWEIL 

INDUSTRIAL 

VERBEX 

RESTAURANT 

VERBEX 

INDUSTRIAL 

INTERSTATE 

4000 

RESTAURANT 

INTERSTATE 

4000 

INDUSTRIAL 

GROUPING 


C 

D 

E 

F 

MEAN 

N 

159.75 

4 

159.50 

4 

159.50 

4 

159 . 50 

4 

159.00 

4 

159.00 

4 

159.00 

4 

158.25 

4 

158.00 

4 

157.75 

4 

it 

157.00 

4 

it 

157.00 

4 

it 

156.75 

4 

it 

156.75 

4 

it 

155 . 25 

4 

it 

153.75 

4 

it 

146 . 50 

4 

it 

146 . 50 

4 

+ 

144.75 

4 

it 

144.50 

4 

it 

143.00 

4 

it 

* 

131.25 

4 

■k 

* 

* 

129 . 25 

4 

★ 

★ 

* 

111.50 

4 

* 

+ 

101.00 

4 

★ 

98 . 50 

4 

* 

90 .75 

4 

a 


i>  * 


5 


pr 


«*. 


£ 


>>  «>  S'  V  V  v- 


37 


Append i x  XI. 

C  onf  ua ion 

and  Error  Matricies: 

Oisc'rete  Task 

Utterance 

Word  R  e  c  c i r i c  e  d 

1 

w 

Y 

"  r 

E  N 

W  r  F  r 

o  : 

2  3  4 

S  b  8  -*  3 

F  N  T  T 

Zero 

3  : 

One 

A 

g 

Two 

Three 

i  i 

Four 

- 

Five 

1  a  11 

Six 

A 

3 

Seven 

1 

i  : 

•o 

Eight 

1 

3 

Nine 

A  fc. 

Yes 

2 

1 

No 

“»  1 

Up 

2 

3 

2  19 

Down 

4 

Right 

1  1  I 

1  5  15 

Left 

1 

:  io 

Discrete  Utterance  Confusion  Matrix: 

Interstate  CSR 

► 


A 

A 


9 


rf 


y 


y 

V* 


s 


V  t  *  e  r  a  n  c  e 


Word  Recognized 


^erc 

C  n  e 

Two 

Three 

Four 

Five 

Six 

Seven 

Eight 

N  me 

Y  es 

No 

Up 

Down 

Right 

Left 


Discrete  Utterance  Confusion  Matrix:  Kurzweil 


•A..J 


R 

E 

J 

7 

10 

14 
3 

15 
5 

38 

22 

14 
7 

13 

3 

31 

3 

2 

15 


microcopy  resolution  test  chart 

NAiiotMt.  bureau  Of  sianoaros  1%)  * 


Utterance 


Word  Recognized 


Zero 

One 

Two 

Three 

Four 

Five 

Six 

Seven 

Eight 

Nine 

Yes 

No 

Up 

Down 

Right 

Left 


Discrete  Utterance  Confusion  Matrix:  IBM 


Appendix  XII.  Tukey  Analysis  of  Speaker  Effects:  Scenario  2 


TUKEY 

'S  STUDENTIZED  RANGE 

(HSD) 

TEST  FOR 

VARIABLE:  CORRECT 

RECOGNITION 

(COR) 

ALPHA 

=0.05  ' 

CONFIDENCES. 95 

DF  = 

24 

MSE-64 . 4803 

CRITICAL  VALUE  OF  STUDENTIZED 

RANGE=4. 166 

COMPARISONS 

SIGNIFICANT  AT 

THE 

0 

.05  LEVEL  ARE 

INDICATED  BY  '***' 

SIMULTANEOUS 

SIMULTANEOUS 

LOWER 

DIFFERENCES 

UPPER 

SUBJECT 

CONFIDENCE 

BETWEEN 

CONFIDENCE 

COMPARISON 

LIMIT 

MEANS 

LIMIT 

2  - 

1 

-4.529 

4.050 

12.629 

2  - 

3 

-2.953 

5.441 

13.835 

2 

4 

0 . 054 

8 . 633 

17.213  *  *  * 

1  - 

2 

-12.629 

-4.050 

4.529 

1  - 

3 

-7.477 

1.391 

10.259 

1  - 

4 

-4.460 

4.583 

13.627 

3  - 

2 

-13.835 

-5.441 

2.953 

3  - 

1 

-10.259 

-1.391 

7.477 

3  - 

4 

-5.675 

3.192 

12.060 

4  - 

2 

-17.213 

-8.633 

-0.054  *** 

4  - 

1 

-13.627 

-4.583 

4.460 

4  - 

3 

-12.060 

-3.192 

5.675 

Appendix  XIII.  Tukey  Analysis  of  Speaker  *  Recognizer:  Scenario  2 


I 


Am 

SIMULTANEOUS 

SIMULTANEOUS 

RECOGNIZER 

LOWER 

DIFFERENCES 

UPPER 

•  m 

*  SUBJECT 

CONFIDENCE 

BETWEEN 

CONFIDENCE 

COMPARISON 

LIMIT 

MEANS 

LIMIT 

a 

ITT 

#2 

ITT 

#1 

-25.469 

0.000 

25.469 

ITT 

#2 

- 

ITT 

#4 

-25.469 

0 . 000 

25.469 

ITT 

#2 

- 

ITT 

#3 

-23.469 

2.000 

27.469 

be 

ITT 

#2 

- 

VER 

*2 

-13.136 

12.333 

37.802 

ITT 

#2 

- 

I  NT 

#1 

-8.469 

17.000 

42.469 

r« 

ITT 

*2 

- 

VOT 

#2 

-8.136 

17.333 

42.802 

ITT 

#2 

- 

I  NT 

#4 

-4.802 

20.667 

46.136 

ITT 

#2 

- 

VER 

#1 

-4.136 

21.333 

46.802 

ITT 

#2 

- 

VER 

#3 

-3.802 

21.667 

47.136 

£ 

ITT 

#2 

- 

INT 

#3 

-5.975 

22.500 

50.975 

L 

ITT 

#2 

- 

I  NT 

#2 

0.198 

25.667 

51 . 136 

*  ★  * 

ITT 

#2 

- 

VER 

*4 

2.198 

27.667 

53.136 

★  *  * 

pi 

ITT 

#2 

- 

TI  #2 

3.531 

29.000 

54.469 

*  *  * 

ITT 

*2 

- 

TI  #3 

1 . 525 

30.000 

58.475 

*  ★  * 

ITT 

#2 

- 

VOT 

#3 

12.531 

38.000 

63.469 

★  *  * 

if 

ITT 

#2 

- 

VOT 

#1 

19.864 

45.333 

70.802 

*  *  * 

1 

1  ITT 

#2 

- 

VOT 

#4 

28.198 

53.667 

79.136 

♦  *  * 

uy 

ITT 

#1 

ITT 

*2 

-25.469 

0.000 

25.469 

H 

!  ITT 

#1 

- 

ITT 

#4 

-25.469 

0.000 

25.469 

ITT 

#1 

- 

ITT 

#3 

-23.469 

2.000 

27.469 

ITT 

#1 

- 

VER 

#2 

-13.136 

12 . 333 

37.802 

r  ITT 

#1 

- 

INT 

#1 

-8.469 

17.000 

42.469 

!  ITT 

#1 

- 

VOT 

#2 

-8 .136 

17.333 

42.802 

ITT 

#1 

- 

INT 

#4 

-4.802 

20.667 

46 . 136 

ITT 

#1 

- 

VER 

*1 

-4.136 

22.500 

50.975 

t  ITT 

#1 

- 

INT 

#3 

-3 . 802 

21.667 

47.136 

ITT 

#1 

- 

VER 

#3 

-5.975 

22 . 500 

50.975 

ITT 

#1 

- 

INT 

#2 

0 . 198 

25.667 

51 . 136 

★  ★  ★ 

»v 

A 

ITT 

#1 

- 

VER 

#4 

2.198 

27.667 

53.136 

*  *  * 

A 

ITT 

#1 

- 

TI  = 

» 2 

3.531 

29 . 000 

54.469 

★  *  ★ 

ITT 

#1 

- 

TI 

#3 

1 . 525 

30 . 000 

58.475 

*  *  * 

fl 

!  ITT 

#1 

- 

VOT 

#3 

12.531 

38 . 000 

63.469 

*  *  * 

P 

ITT 

#1 

- 

VOT 

#1 

19.864 

45 .333 

70.802 

*  *  * 

ITT 

#1 

- 

VOT 

#4 

28.198 

53 . 667 

79 . 136 

★  *  ★ 

ITT 

#4 

ITT 

#2 

-25 . 469 

0.000 

25.469 

ITT 

*4 

- 

ITT 

#1 

-25.469 

0 . 000 

25 . 469 

k» 

ITT 

#4 

0 

ITT 

#3 

-23.469 

2.000 

27.469 

■LJ 

ITT 

*4 

- 

VER 

#2 

-13.136 

12 . 333 

37 . 802 

92 


•>  V. 


SIMULTANEOUS 


SIMULTANEOUS 


I 

RECOGNIZER 

LOWER 

DIFFERENCES 

UPPER 

*  SUBJECT 

CONFIDENCE 

BETWEEN 

CONFIDENCE 

COMPARISON 

LIMIT 

MEANS 

LIMIT 

£ 

ITT 

*4 

_ 

INT 

#1 

-8.469 

17.000 

42.469 

ITT 

*4 

- 

VOT 

#2 

-8.136 

17.333 

42.802 

ITT 

#4 

- 

INT 

#4 

-4.802 

20.667 

46 . 136 

ITT 

#4 

- 

VER 

*1 

-4.136 

21.333 

46.802 

r  » 

ITT 

#4 

- 

VER 

#3 

-3.802 

21.667 

47.136 

ITT 

*4 

- 

INT 

#3 

-5.975 

22 . 500 

50.975 

& 

ITT 

#4 

- 

INT 

#2 

0.198 

25.667 

51.136  *** 

d2 

ITT 

*4 

- 

VER 

#4 

2.198 

27.667 

53.136  *** 

ITT 

*4 

- 

TI  #2 

3 . 531 

29.000 

54.469  *** 

.■» 

•L* 

ITT 

*4 

- 

TI  #3 

1.525 

30 . 000 

58.475  *** 

■>; 

ITT 

#4 

- 

VOT 

#3 

12.531 

38.000 

63.469  *** 

ITT 

*4 

- 

VOT 

ftl 

19.864 

45.333 

70.802  *** 

K 

ITT 

*4 

- 

VOT 

*4 

28.198 

53 . 667 

79.136  *** 

fe 

ITT 

#3 

ITT 

#2 

-27.469 

-2.000 

23.469 

ITT 

#3 

- 

ITT 

ftl 

-27.469 

-2.000 

23.469 

ITT 

#3 

- 

ITT 

ft4 

-27.469 

-2.000 

23.469 

!v 

ITT 

#3 

- 

VER 

ft2 

-13.136 

10 . 333 

35.802 

ITT 

#3 

- 

INT 

#1 

-10.469 

15.000 

40.469 

ITT 

#3 

- 

VOT 

#2 

-10.136 

15.333 

40.802 

1 

ITT 

#3 

- 

INT 

#4 

-6.802 

18.667 

44.136 

ITT 

#3 

- 

VER 

♦  1 

-6.136 

19.333 

44.802 

.«■ 

ITT 

#3 

- 

VER 

ft3 

-5.802 

19.667 

45.136 

i-: 

ITT 

#3 

- 

INT 

#3 

-7.975 

20.500 

48.975 

r. 

ITT 

#3 

- 

INT 

#2 

-1.802 

23.667 

49 . 136 

ITT 

#3 

- 

VER 

#4 

0.198 

25.667 

51 . 136  *** 

ITT 

#3 

- 

TI  < 

*2 

1 . 531 

27.000 

52.469  *** 

:*> 

ITT 

#3 

- 

TI  #3 

-0.475 

28.000 

56.475 

ITT 

#3 

- 

VOT 

ft3 

10.531 

36.000 

61 . 469  *  ** 

“3. 

ITT 

#3 

- 

VOT 

*1 

17.864 

43 . 333 

68 . 802  *  *  * 

ITT 

#3 

- 

VOT 

ft4 

26.198 

51 .667 

77.136  *** 

VER 

#2 

_ 

ITT 

#2 

-37.802 

-12 . 333 

13.136 

£ 

VER 

#2 

- 

ITT 

ftl 

-37.802 

-12 . 333 

13 . 136 

VER 

#2 

- 

ITT 

ft4 

-37.802 

-12 . 333 

13.136 

VER 

#2 

- 

ITT 

#3 

-35 . 802 

-10.333 

15.136 

§ 

VER 

#2 

- 

INT 

#1 

-20.802 

4.667 

30.136 

** 

• 

VER 

#2 

- 

VOT 

ft2 

-20.469 

5.000 

30.469 

VER 

#2 

- 

INT 

ft4 

-17.136 

8.333 

33 . 802 

VER 

#2 

- 

VER 

ftl 

-16.469 

9 . 000 

34.469 

.V 

v 

VER 

#2 

- 

VER 

#3 

-16.136 

9.333 

34 . 802 

VER 

ft2 

- 

INT 

#3 

-18 . 309 

10.167 

38.642 

'V 

VER 

#2 

- 

INT 

ft2 

-12.136 

13.333 

38 . 802 

VER 

#2 

- 

VER 

ft4 

-10.136 

15 . 333 

40 . 802 

•*— 

V1ZR 

ft 2 

- 

TI 

#2 

-8 . 802 

16.667 

42 .136 

SB 

L 


SIMULTANEOUS 


1 

RECOGNIZER 

LOWER 

w 

*  SUBJECT 

CONFIDENCE 

COMPARISON 

LIMIT 

:*; 

VER 

#2 

_ 

TI  #3 

-10.809 

VER 

#2 

- 

VOT 

*3 

0.198 

Cm 

VER 

#2 

- 

VOT 

#1 

7.531 

r>j 

t » 

VER 

#2 

- 

VOT 

#4 

15.864 

.  * 

INT 

#1 

- 

ITT 

#2 

-42.469 

INT 

#1 

- 

ITT 

#1 

-42.469 

•Y 

INT 

#1 

- 

ITT 

*4 

-42.469 

yS* 

INT 

#1 

- 

ITT 

#3 

-40.469 

INT 

#1 

- 

VER 

#2 

-30.136 

INT 

#1 

- 

VOT 

#2 

-25.136 

s 

INT 

#1 

- 

INT 

*4 

-21.802 

INT 

#1 

- 

VER 

#1 

-21.136 

jv 

INT 

#1 

- 

VER 

#3 

-20.802 

INT 

#1 

- 

INT 

#3 

-22.975 

L 

INT 

#1 

- 

INT 

*2 

-16.802 

INT 

#1 

- 

VER 

#4 

-14.802 

S; 

INT 

#1 

- 

TI  #2 

-13.469 

!  <v 

INT 

#1 

- 

TI  < 

*3 

-15.475 

INT 

#1 

- 

VOT 

#3 

-4.469 

* 

INT 

#1 

- 

VOT 

#1 

2.864 

1 

INT 

#1 

- 

VOT 

#4 

11 . 198 

VOT 

#2 

_ 

ITT 

#2 

-42.802 

£ 

VOT 

*2 

- 

ITT 

#1 

-42.802 

VOT 

#2 

- 

ITT 

#4 

-42.802 

VOT 

#2 

- 

ITT 

#3 

-40.802 

A? 

VOT 

#2 

- 

VER 

#2 

-30.469 

.V 

VOT 

#2 

- 

INT 

#1 

-25.802 

VOT 

#2 

- 

INT 

*4 

-22.136 

.> 

«v 

VOT 

#2 

- 

VER 

#1 

-21.469 

W 

VOT 

*2 

- 

VER 

#3 

-21 . 136 

VOT 

#2 

- 

INT 

#3 

-23.309 

v, 

VOT 

#2 

- 

INT 

#2 

-17 . 136 

ft- 

VOT 

#2 

- 

VER 

#4 

-15.136 

VOT 

#2 

- 

TI  * 

#2 

-13.802 

rt 

VOT 

#2 

- 

TI  > 

*3 

-15.809 

i 

VOT 

#2 

- 

VOT 

#3 

-4.802 

ir 

VOT 

#2 

- 

VOT 

#1 

2 . 531 

.  * 

VOT 

#2 

- 

VOT 

#4 

10 . 864 

INT 

*4 

- 

ITT 

#2 

-46 .136 

INT 

#4 

- 

ITT 

#1 

-46 . 136 

INT 

#4 

- 

ITT 

*4 

-46 .136 

INT 

#4 

- 

ITT 

#3 

-44.136 

N-, 

INT 

*4 

- 

VER 

#2 

-33 . 802 

V, 

•/ 

y. 

SIMULTANEOUS 


DIFFERENCES 

UPPER 

BETWEEN 

CONFIDENCE 

MEANS 

LIMIT 

17.667 

46.142 

25.667 

51 . 136  *** 

33.000 

58.469  *** 

41.333 

66 . 802  *  *  * 

-17.000 

8.469 

-17.000 

8.469 

-17.000 

8.469 

-15.000 

8.469 

-4.667 

10.469 

0.333 

25.802 

3.667 

29.136 

4.333 

29.802 

4.667 

30.136 

5.500 

33.975 

8.667 

34.136 

10.667 

36.136 

12.000 

37.469 

13.000 

41.475 

21.000 

46.469 

28.333 

53 . 802  *  *  * 

36.667 

62.136  *** 

-17.333 

8.136 

-17.333 

8.136 

-17.333 

8.136 

-15.333 

10.136 

-5.000 

20.469 

-0.333 

25.136 

3.333 

28.802 

4.000 

29.469 

4.333 

29.802 

5.167 

33.642 

8 . 333 

33.802 

10 . 333 

35.802 

11.667 

37.136 

12.667 

41 . 142 

20.667 

46 .136 

28 . 000 

53.469  *** 

36 . 333 

61.802  *** 

-20.667 

4.802 

-20.667 

4.802 

-20.667 

4.802 

-18.667 

6 . 802 

-8 . 333 

17.136 

7: 


■f.y. 


JSfeS 


HO 

L 


w 

g 

RECOGNIZER 
*  SUBJECT 
COMPARISON 

INT  #4  -  INT 

#1 

SIMULTANEOUS 

LOWER 

CONFIDENCE 

LIMIT 

-29.136 

DIFFERENCES 

BETWEEN 

MEANS 

-3.667 

SIMULTANEOUS 

UPPER 

CONFIDENCE 

LIMIT 

21.802 

INT 

#4 

- 

VOT 

#2 

-28.802 

-3.333 

22.136 

INT 

*4 

- 

VER 

#1 

-24.802 

0.667 

26.136 

s 

INT 

*4 

- 

VER 

#3 

-24.469 

1.000 

26.469 

INT 

#4 

- 

INT 

#3 

-26.642 

1.833 

30.309 

INT 

*4 

- 

INT 

#2 

-20.469 

5. COO 

30.469 

$ 

INT 

*4 

- 

VER 

*4 

-18.469 

7.000 

32.469 

INT 

#2 

_ 

TI  ■ 

#2 

-17.136 

8.333 

33 . 802 

.V 

INT 

#2 

- 

TI  #3 

-19.142 

9.333 

37.809 

INT 

*2 

- 

VOT 

#3 

-8.136 

17.333 

42.802 

_  < 

INT 

#2 

- 

VOT 

#1 

-0.802 

24.667 

50.136 

,yj 

INT 

#2 

- 

VOT 

*4 

7.531 

33.000 

58.469  *** 

£ 

VER 

#1 

_ 

ITT 

*2 

-46.802 

-21.333 

4.136 

VER 

#1 

- 

ITT 

#1 

-46.802 

-21.333 

4.136 

,‘j 

•j 

VER 

#1 

- 

ITT 

*4 

-46.802 

-21.333 

4.136 

-\ 

VER 

#1 

- 

ITT 

#3 

-44.802 

-19.333 

6.136 

VER 

#1 

- 

VER 

#2 

-34.469 

-9.000 

16.469 

fc 

VER 

#1 

- 

INT 

#1 

-29.802 

-4.333 

21 . 136 

1 

VER 

#1 

- 

VOT 

#2 

-29.469 

-4.000 

21.469 

VER 

#1 

- 

INT 

*4 

-26.136 

-0.667 

24.802 

VER 

#1 

- 

VER 

#3 

-25.136 

0.333 

25.802 

$ 

VER 

#1 

- 

INT 

#3 

-27.309 

1 . 167 

29.642 

•.i 

VER 

#1 

- 

INT 

#2 

-21 . 136 

4.333 

29.802 

VER 

#1 

- 

VER 

#4 

-19.136 

6.333 

31.802 

VER 

#1 

- 

TI  » 

*2 

-17.802 

7.667 

33.136 

i* 

VER 

#1 

- 

TI  ■ 

#3 

-19.809 

8.667 

37.142 

VER 

#1 

- 

VOT 

#3 

-8 . 802 

16.667 

42.136 

§ 

VER 

#1 

- 

VOT 

#1 

-1.469 

24 . 000 

49.469 

VER 

#1 

- 

VOT 

#4 

6 . 864 

32 . 333 

57 .802  *  *  * 

? . 

VER 

#3 

_ 

ITT 

#2 

-47.136 

-21.667 

3.802 

VER 

#3 

- 

ITT 

#1 

-47.136 

-21.667 

3.802 

v 

VER 

#3 

- 

ITT 

*4 

-47.136 

-21 . 667 

3.802 

VER 

#3 

- 

ITT 

#3 

-45.136 

-19.667 

5.802 

£ 

» 

VER 

#3 

- 

VER 

#2 

-34.802 

-9 . 333 

16 . 136 

VER 

#3 

- 

INT 

#1 

-30 . 136 

-4.667 

20 . 802 

VER 

#3 

- 

VOT 

#2 

-29.802 

-4.333 

21.136 

V 

•  m 

VER 

#3 

- 

INT 

*4 

-26 . 469 

-1 .000 

24 . 469 

y, 

VER 

#3 

- 

VER 

#1 

-25.802 

-0 . 333 

25 . 136 

VER 

#3 

- 

INT 

#3 

-27 .642 

0 . 833 

29 . 309 

L* 

VER 

#3 

- 

INT 

#2 

-21 . 469 

4 . 000 

29 . 469 

N« 

VER 

#3 

- 

VER 

* 4 

-19.469 

6 . 000 

31 . 469 

VER 

#3 

- 

TI 

#2 

-18.136 

7 .333 

32 . 802 

95 

P 


c 

5 

SIMULTANEOUS  SIMULTANEOUS 


1 

RECOGNIZER 
*  SUBJECT 

LOWER 

CONFIDENCE 

DIFFERENCES 

BETWEEN 

UPPER 

CONFIDENCE 

£ 

COMPARISON 

VER  #3  -  TI 

#3 

LIMIT 

-20.142 

MEANS 

8.333 

LIMIT 

36.809 

VER 

#3 

- 

VOT 

43 

-9.136 

16.333 

41.802 

pm 

VER 

#3 

- 

VOT 

41 

-1 . 802 

23.667 

49.136 

rv 

>\m 
*  *' 

VER 

*3 

- 

VOT 

44 

6.531 

32.000 

54.469  *** 

INT 

#3 

- 

ITT 

42 

-50.975 

-22.500 

5.975 

> 

INT 

#3 

- 

ITT 

41 

-50.975 

-22.500 

5.975 

,v 

INT 

#3 

- 

ITT 

44 

-50.975 

-22.500 

5.975 

INT 

#3 

- 

ITT 

43 

-48.975 

-20 . 500 

7.975 

y 

INT 

#3 

- 

VER 

42 

-38.642 

-10.167 

18 . 309 

s 

INT 

#3 

- 

INT 

41 

-33.975 

-5 . 500 

22.975 

INT 

#3 

- 

VOT 

42 

-33.642 

-5.167 

23.309 

. 

INT 

#3 

- 

INT 

44 

-30 . 309 

-1.833 

26.462 

£ 

tr-. 

INT 

#3 

- 

VER 

41 

-29.642 

-1 . 167 

27 . 309 

INT 

#3 

- 

VER 

43 

-29.309 

-0.833 

27.642 

INT 

#3 

- 

INT 

42 

-25.309 

3.167 

31 .642 

*.  * 

INT 

#3 

- 

VER 

44 

-23.309 

5 . 167 

33.672 

K  " 
V 

INT 

#3 

- 

TI 

42 

-21.975 

6.500 

34 . 975 

INT 

#3 

- 

TI 

43 

-23.693 

7.500 

33.692 

n 

i  INT 

#3 

- 

VOT 

43 

-12.975 

15 . 500 

48.975 

INT 

#3 

- 

VOT 

41 

-5.642 

22.833 

51 . 309 

INT 

#3 

- 

VOT 

#4 

2.691 

31 . 167 

59.642  »** 

& 

INT 

#2 

_ 

ITT 

42 

-51 . 136 

-25.667 

-0.198  *** 

INT 

*2 

- 

ITT 

41 

-51.136 

-25.667 

-0 . 19B  *  *  * 

INT 

42 

- 

ITT 

44 

-51.136 

-25.667 

-0.198  ♦** 

V1 

INT 

42 

- 

ITT 

43 

-49 . 136 

-23 . 667 

1 . 802 

V 

INT 

‘■*2 

- 

VER 

42 

-38 . 802 

-13 . 333 

12.136 

INT 

42 

- 

TNT 

41 

-34.136 

-8 . 667 

16 . 802 

INT 

42 

- 

VOT 

42 

-33 . 802 

-8 . 333 

17 . 136 

>■ 

INT 

4  2 

- 

INT 

44 

-30.469 

-5 . 000 

20 . 469 

INT 

4  2 

- 

VER 

41 

-29.302 

-4.333 

21.136 

%v 

INT 

42 

- 

VER 

43 

-29.469 

-4 . 000 

21.469 

V 

INT 

4  2 

- 

INT 

43 

-31 . 642 

-3 . 167 

25.309 

V* 

INT 

42 

- 

VER 

44 

-23 . 469 

2 . 000 

27 . 469 

I  NT 

4  2 

- 

TI 

42 

-22. 136 

3 . 333 

28.802 

5 

t 

INT 

42 

- 

TI 

43 

-24. 142 

4 . 333 

32 . 809 

INT 

42 

- 

VOT 

43 

-13 . 136 

12 . 333 

37.802 

INT 

42 

- 

VOT 

41 

-5 . 802 

19.667 

45 .136 

2 

INT 

42 

- 

VOT 

44 

2.531 

28 . 000 

53.469  *** 

VER 

44 

_ 

ITT 

42 

-53.136 

-27 . 667 

-2.198  *** 

VL 

VER 

44 

- 

ITT 

41 

-53 . 136 

-27.667 

-2.198  *** 

VER 

44 

- 

ITT 

44 

-53.136 

-27 . 667 

-2.198  *** 

VER 

44 

- 

ITT 

43 

-51.136 

-25.667 

-0.198  *** 

96 


0 

L 


1 

RECOGNIZER 

* 

SUBJECT 

jr, 

COMPARI 

S 

ON 

8 

VER 

*4 

- 

VER 

#2 

VER 

#4 

- 

INT 

#1 

* 

VER 

#4 

- 

VOT 

#2 

VER 

#4 

- 

INT 

#4 

VER 

#4 

- 

VER 

#1 

VER 

#4 

- 

VER 

#3 

£ 

VER 

#4 

- 

INT 

#3 

VER 

#4 

- 

INT 

#2 

VER 

#4 

- 

TI 

*2 

VER 

#4 

- 

TI 

#3 

* 

VER 

#4 

- 

VOT 

#3 

VER 

#4 

- 

VOT 

#1 

VER 

#4 

- 

VOT 

#4 

r 

TI 

#2  - 

ITT 

#2 

TI 

# 2  - 

ITT 

*1 

s- 
a  • 

TI 

n  2 

- 

ITT 

#4 

.  ^ 

T I 

#2 

- 

ITT 

#3 

TI 

#2 

- 

VER 

#2 

M 

TI 

# 2 

- 

INT 

#1 

§ 

TI 

#2 

- 

VOT 

#2 

TI 

#2 

- 

INT 

#4 

or 

TI 

#2 

- 

VER 

#1 

ft 

TI 

* 2 

- 

VER 

#3 

V* 

TI 

#2 

- 

INT 

#3 

* 

TI 

#2 

- 

INT 

#2 

> 

TI 

*2 

- 

VER 

#4 

TI 

*2 

- 

TI 

#3 

TI 

*2 

- 

VOT 

#3 

S 

TI 

#2 

- 

VOT 

#1 

TI 

#2 

- 

VOT 

#4 

A 

TI 

#3 

- 

ITT 

#2 

2? 

TI 

#3 

- 

ITT 

#1 

TI 

#3 

- 

ITT 

#4 

5 

TI 

#3 

- 

ITT 

#3 

TI 

#3 

- 

VER 

#2 

t 

TI 

#3 

- 

INT 

#1 

TI 

#3 

- 

VOT 

#2 

£ 

TI 

#3 

- 

INT 

#4 

£ 

TI 

#3 

- 

VER 

#1 

TI 

#3 

- 

VER 

#3 

Si 

TI 

#3 

- 

INT 

#3 

TI 

#3 

- 

INT 

#2 

TI 

#3 

- 

VER 

#4 

> 

.  v 
\ 


SIMULTANEOUS 

LOWER 

DIFFERENCE 

CONFIDENCE 

BETWEEN 

LIMIT 

MEANS 

-40.802 

-15.333 

-36 . 136 

-10.667 

-35.802 

-10 . 333 

-32.469 

-7.000 

-31 . 802 

-6.333 

-31.469 

-6.000 

-33 . 642 

-5.167 

-27.469 

-2.000 

-24.136 

1.333 

-26.142 

2 . 333 

-15.136 

10.333 

-7.802 

17 . 667 

0.531 

26.000 

-54.469 

-29 . 000 

-54.469 

-29.000 

-54.469 

-29.000 

-52.469 

-27.000 

-42.136 

-16.667 

-37.469 

-12.000 

-37.136 

-11.667 

-33.802 

-8.333 

-33.136 

-7.667 

-32.802 

-7.333 

-34.975 

-6.500 

-28.802 

-3.333 

-26 . 802 

-1 . 333 

-27.475 

1.000 

-16.469 

9.000 

-9.136 

16.333 

-0.802 

24.667 

-58.475 

-30 . 000 

-58.475 

-30.000 

-58.475 

-30 . 000 

-56.475 

-28 . 000 

-46 . 142 

-17.667 

-41.475 

-13.000 

-41 . 142 

-12.667 

-37.809 

-9.333 

-37. 142 

-8.667 

-36 . 809 

-8 . 333 

-38 . 693 

-7.500 

-32.809 

-4.333 

-30 . 809 

•2 . 333 

97 


SIMULTANEOUS 

UPPER 

CONFIDENCE 

LIMIT 

10 . 136 

14 . 802 

15.136 

18.469 

19.136 

19.469 
23 . 309 

23.469 

26.802 

30.809 

35 . 802 

43.136 

5 1 . 469  *  *  * 

-3.531  *  *  * 
-3.531  ** 
-3.531  *** 
-1.531  *** 

8.802 

13.469 

13.802 

17.136 

17.802 

18.136 
21.975 

24.136 

24.136 

29.475 

34.469 

41 . 802 

50 . 136 

-1.525  *** 
-1.525  *** 
-1.525  *  *  * 
0.475 

10.809 

15.475 

15 . 809 

10 . 142 

19 . 809 

20.142 
23.693 

24 . 142 

26.142 


c 


a 


R 


& 


wt 

A 


(.  .V 

[ft  >j 


k.  >; 

r- 


kr 


:: 


Jt 


>. 


Cf 


v 


V 


RECOGNIZER 
*  SUBJECT 


SIMULTANEOUS 

LOWER 

CONFIDENCE 


DIFFERENCES 

BETWEEN 


98 


SIMULTANEOUS 

UPPER 

CONFIDENCE 


COMPARISON 

LIMIT 

MEANS 

LIMIT 

TI  #3 

_ 

TI  #2 

-29.475 

-1.000 

27.475 

TI  #3 

- 

VOT 

#3 

-20.475 

8.000 

36.475 

TI  #3 

- 

VOT 

#1 

-13.142 

15.333 

43.809 

TI  #3 

- 

VOT 

*4 

-4.809 

23.667 

52 . 142 

VOT 

#3 

_ 

ITT 

#2 

-63.469 

-38.000 

-12 . 531 

*  ■*  ★ 

VOT 

#3 

- 

ITT 

#1 

-63.469 

-38.000 

-12 . 531 

★  it 

VOT 

#3 

- 

ITT 

#4 

-63.469 

-38 . 000 

-12.531 

*  *  * 

VOT 

#3 

- 

ITT 

#3 

-61.469 

-36 . 000 

-10.531 

*  ★  * 

VOT 

#3 

- 

VER 

#2 

-51 . 136 

-25.667 

-0.198  * 

*  * 

VOT 

#3 

- 

I  NT 

#1 

-46.469 

-21.000 

4.469 

VOT 

#3 

- 

VOT 

#2 

-46 . 136 

-20.667 

4.802 

VOT 

#3 

- 

I  NT 

#4 

-42.802 

-17 . 333 

8 . 136 

VOT 

#3 

- 

VER 

#1 

-42 . 136 

-16.667 

8.802 

VOT 

#3 

- 

VER 

#3 

-41.802 

-16.333 

9.136 

VOT 

#3 

- 

INT 

#3 

-43.975 

-15.500 

12.975 

VOT 

#3 

- 

I  NT 

#2 

-37.802 

-12.333 

13.136 

VOT 

#3 

- 

VER 

#4 

-35.802 

-10.333 

15 . 136 

VOT 

#3 

- 

TI 

#2 

-34.469 

-9.000 

16.469 

VOT 

#3 

- 

TI  : 

*3 

-36.475 

-8 . 000 

20.475 

VOT 

#3 

- 

VOT 

#1 

-18.136 

7.333 

32.802 

VOT 

#3 

VOT 

#4 

-9.802 

15.667 

41.136 

VOT 

#1 

_ 

ITT 

#2 

-70.802 

-45.333 

-19.864 

★  *  * 

VOT 

#1 

- 

ITT 

#1 

-70.802 

-45.333 

-19.864 

★  *  * 

VOT 

#1 

- 

ITT 

#4 

-70 . 802 

-45.333 

-19.864 

*  *  ★ 

VOT 

#1 

- 

ITT 

#3 

-68.802 

-43 . 333 

-17.864 

★  *  * 

VOT 

#1 

- 

VER 

#2 

-58.469 

-33 . 000 

-7.531 

*  *  * 

VOT 

#1 

- 

INT 

#1 

-53 . 802 

-28 . 333 

-2.864 

♦  *  * 

VOT 

#1 

- 

VOT 

#2 

-53.469 

-28 . 000 

-2.531 

*  *  * 

VOT 

#1 

- 

INT 

#4 

-50.136 

-24 . 667 

0 . 802 

VOT 

#1 

- 

VER 

#1 

-49 . 469 

-24 . 000 

1.469 

VOT 

#1 

- 

VER 

#3 

-49.136 

-23.667 

1.802 

VOT 

#1 

- 

INT 

#3 

-51.309 

-22 . 833 

5.642 

VOT 

#1 

- 

INT 

#2 

-45.136 

-19.667 

5 . 802 

VOT 

#1 

- 

VER 

#4 

-43 . 136 

-17.667 

7.802 

VOT 

#1 

- 

TI 

#2 

-41 . 802 

-16.333 

9 . 136 

VOT 

#1 

- 

TI 

#3 

-43.809 

-15 . 333 

13 . 142 

VOT 

#1 

- 

VOT 

#3 

-32 . 802 

-7.333 

18.136 

VOT 

#1 

VOT 

#4 

-17.136 

8 . 333 

33 . 802 

VOT 

#4 

_ 

ITT 

#2 

-79 . 136 

-53.667 

-28 . 198 

*  *  * 

VOT 

#4 

- 

ITT 

#1 

-79 .136 

-53.667 

-18 . 198 

*  *  * 

VOT 

#4 

- 

ITT 

#4 

-17.136 

-53.667 

-28 . 198 

it  ^t  it 

VOT 

#4 

- 

ITT 

#3 

-77 . 136 

-51 . 667 

-26 . 198 

*  *  * 

/A 


r 

i* 


iQi 

( 

u 


RECOGNIZER 
*  SUBJECT 


SIMULTANEOUS 

LOWER 

CONFIDENCE 


DIFFERENCES 

BETWEEN 


SIMULTANEOUS 

UPPER 

CONFIDENCE 


COMPARISON 

LIMIT 

MEANS 

LIMIT 

VOT 

#4 

VER 

#2 

-66 . 802 

-41.333 

-15.846 

★  *  ★ 

VOT 

#4 

- 

INT 

#1 

-62 . 106 

-36.667 

-11 . 198 

*  *  * 

VOT 

#4 

- 

VOT 

#2 

-61 . 802 

-36 . 333 

-10 . 864 

*  ★  * 

VOT 

#4 

- 

INT 

#4 

-58.469 

-33.000 

-7.531 

★  *  * 

VOT 

#4 

- 

VER 

#1 

-57.802 

-32.333 

-6 . 864 

*  4r  * 

VOT 

#4 

- 

VER 

#3 

-57.469 

-32 . 000 

-6 . 531 

*  *  * 

VOT 

#4 

- 

INT 

#3 

-59.642 

-31 . 167 

-2 . 691 

*  *  * 

VOT 

#4 

- 

INT 

#2 

-53 . 469 

-28.000 

-2.531 

*  *  ★ 

VOT 

#4 

- 

VER 

#4 

-51.469 

-26 . 000 

-0 . 531 

★  *  ★ 

VOT 

#4 

- 

TI 

#2 

-50 . 136 

-24.667 

0.802 

VOT 

#4 

- 

TI 

#3 

-52.142 

-23.667 

4.809 

VOT 

#4 

- 

VOT 

#3 

-41 . 136 

-15.667 

9.802 

VOT 

#4 

- 

VOT 

#1 

-33.802 

-8.333 

17 . 136 

%9  s'  s'  */  s’  s’  */ 


8k 

L 


Appendix  XIV.  Tukey  Analysis  of  Recognizer  *  Noise:  Scenario  2 


TUKEY'S  STUDENTI2ED  RANGE  ( HSD )  TEST  FOR  VARIABLE:  CORRECT 
ALPHAS . 05  CONFIDENCES.  95  DF  =  24  MSE-64.4803 
CRITICAL  VALUE  OF  STUDENTIZED  RANGES. 319 


COMPARISONS  SIGNIFICANT  AT  THE  0.05  LEVEL  ARE  INDICATED  BY 


RECOGNIZER 
*  NOISE 
COMPARISON 


SIMULTANEOUS 

LOWER 

CONFIDENCE 

LIMIT 


BETWEEN 

MEANS 


SIMULTANEOUS 

UPPER 

CONFIDENCE 

LIMIT 


•j  “A 
■J  V 


$ ; 

#  > 

i 

l  » 


:•  /■ 

*  - 


ITT 

#2 

- 

ITT  #0 

-21.354 

0.000 

21.354 

ITT 

#2 

- 

ITT  #1 

-19.854 

1 . 500 

22.854 

ITT 

#2 

- 

VER  #2 

-9.354 

12.000 

33.354 

ITT 

#2 

- 

VER  #0 

-7.104 

14.250 

35 . 604 

ITT 

#2 

- 

INT  #2 

-5.854 

15 . 500 

36.854 

ITT 

#2 

- 

INT  #0 

1.646 

23.000 

44.354 

*  *  * 

ITT 

#2 

- 

TI  *0 

-2 .153 

24.000 

50.153 

ITT 

#2 

- 

TI  #1 

-7.764 

26.000 

59.764 

ITT 

#2 

- 

INT  #1 

3.985 

27.000 

50.065 

A  *  ★ 

ITT 

# 2 

- 

VOT  #0 

18 . 146 

34.500 

55.854 

★  *  * 

ITT 

#2 

- 

VOT  #2 

14.646 

36 . 000 

57.354 

*  *  * 

ITT 

#2 

- 

VER  #1 

14.646 

36.000 

57.354 

*  ★  * 

ITT 

#2 

- 

TI  #  2 

10 . 347 

36 . 500 

62 .653 

*  *  * 

ITT 

#2 

- 

VOT  #1 

23 . 896 

45.250 

66 . 604 

*  ★  ★ 

ITT 

#0 

-  ITT  #2 

-21.354 

0.000 

21 . 354 

ITT 

#0 

-  ITT  #1 

-19 . 854 

1 . 500 

22 . 854 

ITT 

#0 

-  VER  #2 

-9 . 354 

12.000 

33 .354 

ITT 

#0 

-  VER  #0 

-7 . 104 

14.250 

35.604 

ITT 

#0 

-  INT  #2 

-5 . 854 

15 . 500 

36 . 854 

ITT 

#0 

-  INT  #0 

1 . 646 

23 . 000 

44 . 354 

4r  *  * 

ITT 

#0 

-  TI  #0 

-2.153 

24.000 

50.153 

ITT 

#0 

-  TI  #1 

-7.764 

26 . 000 

59.764 

ITT 

#0 

-  INT  #1 

3 . 935 

27 . 000 

50 . 065 

*  *  * 

ITT 

#0 

-  VOT  #0 

13.146 

34 . 500 

55 . 854 

*  *  ★ 

ITT 

#0 

-  VOT  #2 

14.646 

36.000 

57.354 

*  *  * 

ITT 

#0 

-  VER  #1 

14 . 646 

36 . 000 

57 . 354 

*•  *  * 

ITT 

#0 

-  TI  #2 

10 . 347 

36 . 500 

62 . 653 

*  *  * 

ITT 

#0 

-  VOT  # 1 

23 .896 

45 . 250 

66 . 604 

*  *  * 

ITT 

#1 

_ 

ITT  # 2 

-22 .854 

-1 . 500 

19 . 854 

/ 

If 

cr  *_•  a*  » 


*r*"  ».»  rm'  nr  ■»  pj«  PJP.T  »."  H"  ■.'tPk'  !«.'  • "?  W  L"j  1H»  ftfl  g*  U  Af  »JT  >  M  HW  li.M'Wl*  wrWgtWtfkgn»t«imw«imtlUlUWWWW 

. 


!  s* 

!  £ 

* 

SIMULTANEOUS 

SIMULTANEOUS 

1  ■ 

RECOGNIZER 

LOWER 

DIFFERENCES 

UPPER 

1  5 

*  NOISE 

CONFIDENCE 

BETWEEN 

CONFIDENCE 

E 

COMPARISON 

LIMIT 

MEANS 

LIMIT 

1  % 

ITT 

#1  -  ITT  #0 

-22.854 

-1 . 500 

19 . 854 

r 

ITT 

#1  -  VER  #2 

-10.854 

10 . 500 

31 . 854 

1 

ITT 

#1  -  VER  #0 

-8.604 

12.750 

34. 104 

$ 

ITT 

#1  -  INT  #2 

-7.354 

14 . 000 

35.354 

j  *» 

ITT 

#1  -  INT  #0 

0 . 146 

21 . 500 

42.854  *  *  * 

» 

jkl  ... 

ITT 

#1  -  TI  #0 

-3 . 653 

22.500 

48.653 

£ 

ITT 

#1  -  TI  #1 

-9.264 

24.500 

58 . 264 

i  A 

ITT 

#1  -  INT  #1 

2.435 

25 . 500 

48 . 565  *  *  * 

l 

ITT 

#1  -  VOT  #0 

11.646 

33 . 000 

54.354  *** 

^  f  A 

ITT 

#1  -  VOT  #2 

13.146 

34.500 

55.854  *** 

V 

•  w  _ 

*  .  » 

ITT 

#1  -  VER  #1 

13.146 

34 . 500 

55.854  *** 

,* 

ITT 

#1  -  TI  #2 

8.847 

35.000 

61.153  *  *  * 

■ 

•  v" 

ITT 

#1  -  VOT  #1 

22.396 

43.750 

65 . 104  *  *  * 

J  ‘ 

VER 

#2  -  ITT  #2 

-33.354 

-12 . 000 

9.354 

* 

VER 

#2  -  ITT  #0 

-33.354 

-12 . 000 

9.354 

•  ^ 

VER 

#2  -  ITT  #1 

-31.854 

-10.500 

10 . 854 

■!  /; 

VER 

#2  -  VER  #0 

-19.104 

2.250 

23.604 

? 

VER 

#2  -  INT  #2 

-17.854 

3 . 500 

24.854 

1  i 

VER 

#2  -  INT  #0 

-10.354 

11 . 000 

32 . 354 

!  1 

VER 

#2  -  TI  #0 

-14.153 

12.000 

38 . 153 

VER 

#2  -  TI  #1 

-19.764 

14.000 

47.764 

s. 

v  S 

VER 

#2  -  INT  #1 

-8.065 

15.000 

38 . 065 

VER 

#2  -  VOT  #0 

1  .  146 

22.500 

43 .854 

N. 

VER 

#2  -  VOT  #2 

2.646 

24.000 

45.354  *** 

|  . 

VER 

#2  -  VER  #1 

2.646 

24 . 000 

45.354  *** 

"  V: 

VER 

#2  -  TI  #2 

-1.653 

24.500 

50 . 653 

V 

w 

1 

VER 

#2  -  VOT  #1 

11.896 

33 . 250 

54.604  *  *  * 

v  *. 

•.  •' 

VER 

#0  -  ITT  #2 

-35.604 

-14.250 

7 . 104 

VER 

#0  -  ITT  #0 

-35 . 604 

-14.250 

7 . 104 

t 

VER 

#0  -  ITT  #1 

-34.104 

-12.750 

8 . 604 

E 

F  A 

VER 

#0  -  VER  #2 

-23 . 604 

-2.250 

19 . 104 

E  & 

VER 

#0  -  INT  #2 

-20 . 104 

1 . 250 

22 . 604 

K 

VER 

#0  -  INT  #0 

-  12 . 604 

8 . 750 

30. 104 

K  *y 

VER 

#0  -  ITT  #0 

-16 . 403 

9.750 

35 . 903 

IV  vl 

VER 

#0  -  TI  #1 

-22.014 

11.750 

45 .514 

•  ' 

VER 

#0  -  INT  #1 

-10.315 

12.750 

35 .815 

VER 

#0  -  VOT  tO 

-1 . 104 

20 . 250 

41 . 604 

r 

r  /• 

VER 

tO  -  VOT  1 2 

0.396 

21 .750 

43.104  *  *  * 

E  v- 

VER 

#0  -  VER  tl 

0 . 396 

21.750 

43.104  *** 

s 

VER 

#0  -  TI  #2 

-3 . 903 

22 . 250 

48 . 403 

8  o.- 

L. 

VER 

#0  -  VOT  tl 

9 . 646 

31 . 000 

52.354  *** 

>  - 

l  •;:• 

r 


r 

r 


101 


*Vaflh* 


*■  jWCflhVflh  ■ 


A^. 


L 


§ 


i  l 


i 


£ 


I 


A’ 


RECOGNIZER 
*  NOISE 
COMPARISON 


INT 

#2 

- 

ITT 

#2 

INT 

#2 

- 

ITT 

#0 

INT 

#2 

- 

ITT 

#1 

INT 

#2 

- 

VER 

#2 

INT 

#2 

- 

VER 

#0 

INT 

#2 

- 

INT 

#0 

INT 

#2 

- 

TI 

#0 

INT 

#2 

- 

TI 

#1 

INT 

#2 

- 

INT 

#1 

INT 

#2 

- 

VOT 

to 

INT 

#2 

- 

VOT 

#2 

INT 

#2 

- 

VER 

tl 

INT 

#2 

- 

TI 

#2 

INT 

#2 

- 

VOT 

tl 

INT 

#0 

- 

ITT 

t2 

INT 

#0 

- 

ITT 

to 

INT 

#0 

- 

ITT 

#1 

INT 

#0 

- 

VER 

#2 

INT 

#0 

- 

VER 

to 

INT 

#0 

- 

INT 

t2 

INT 

#0 

- 

TI 

to 

INT 

#0 

- 

TI 

tl 

INT 

#0 

- 

INT 

tl 

INT 

#0 

- 

VOT 

to 

INT 

#0 

- 

VOT 

t2 

INT 

#0 

- 

VER 

tl 

INT 

#0 

- 

TI 

t2 

INT 

#0 

- 

VOT 

tl 

T I 

#0 

- 

ITT 

t2 

T I 

#0 

- 

ITT 

to 

TI 

#0 

- 

ITT 

tl 

T I 

#0 

- 

VER 

t2 

TI 

#0 

- 

VER 

to 

TI 

#0 

- 

INT 

t2 

TI 

#0 

- 

INT 

to 

TI 

#0 

- 

TI 

tl 

TI 

#0 

- 

INT 

tl 

TI 

#0 

- 

VOT 

to 

TI 

#0 

- 

VOT 

t2 

TI 

#0 

- 

VOT 

tl 

TI 

#0 

- 

TI 

1 2 

TI 

to 

- 

VER 

tl 

t: 

tl  - 

ITT 

t2 

SIMULTANEOUS 

LOWER 

CONFIDENCE 

LIMIT 

-36.854 
-36.854 
-35.354 
-24.854 
-22.604 
-13.854 
-17.653 
-23.264 
-11 . 565 
-2.354 
-0.854 
-0.854 
-5.153 
8.396 
-44.354 
-44.354 
-42.854 
-32.354 
-30.104 
-28.854 
-25.153 
-30.764 
-19.065 
-9.854 
-8.354 
-8.354 
-12.653 
0.896 

-50.153 
-50.153 
-48.653 
-38.153 
-35.903 
-34 . 653 
-27.153 
-34.986 
-24.568 
-15.653 
-14. 153 
-14.153 
-17.699 
-4.903 

-59.764 


DIFFERENCES 

BETWEEN 

MEANS 

-15.500 

-15.500 

-14.000 

-3.500 

-1.250 

7.500 

8.500 

10 . 500 

11.500 
19.000 

20.500 

20 . 500 
21.000 
29.750 

-23.000 
-23.000 
-21.500 
-11 . 000 
-8.750 
-7.500 
1.000 
3.000 
4.000 

11 . 500 
13.000 
13.000 

13.500 

22.250 

-24.000 
-24.000 
-22 . 500 
-12 . 000 
-9.750 
-8.500 
-1.000 
2.000 
3 . 000 

10 . 500 
12 . 000 
12 . 000 

12 . 500 

21.250 

-26 . 000 


SIMULTANEOUS 

UPPER 

CONFIDENCE 

LIMIT 

5.854 

5.854 

7.354 

17.854 

20.104 

28.854 

34.653 
44.264 
34.656 

40.354 

41.854 

41.854 

47.153 

51 . 104  *** 
- 1 . 646  *  *  * 
-1.646  *** 
-0 . 146  *  *  * 

10.354 

12.604 

13.854 

27.153 
36.764 
27.065 

32.854 

34.354 
34.354 

39.653 

43 . 604  *  *  * 

2.153 

2.153 

3.653 

14.153 

16.403 

17.653 

25.153 
38.986 
30.568 

36 . 653 

38 . 153 
38 .153 
42.699 

47.403 

7 . 764 


102 


IU* 

L 


s 

SIMULTANEOUS 

SIMULTANEOUS 

■ 

RECOGNIZER 

LOWER 

DIFFERENCES 

UPPER 

*  NOISE 

CONFIDENCE 

BETWEEN 

CONFIDENCE 

COMPARISON 

LIMIT 

MEANS 

LIMIT 

TI  #1  - 

ITT  #0 

-59.764 

-26.000 

7.764 

TI  #1  - 

ITT  tl 

-58.264 

-24.500 

9.264 

TI  #1  - 

VER  t2 

-47.764 

-14.000 

19.764 

5 

TI  #1  - 

VER  tO 

-45.514 

-11.750 

22.014 

•v 

TI  #1  - 

INT  t2 

-44.264 

-10.500 

23.264 

TI  #1  - 

INT  tO 

-36.764 

-3.000 

30.764 

TI  #1  - 

TI  tO 

-38.986 

-2.000 

34.986 

a 

TI  #1  - 

INT  tl 

-33.871 

1.000 

35.871 

TI  #1  - 

VOT  tO 

-25.264 

8.500 

42.264 

"V 

TI  #1  - 

VOT  t2 

-23.764 

10.000 

43.764 

TI  #1  - 

VER  tl 

-23.764 

10.000 

43.764 

' 

TI  #1  - 

TI  t2 

-26.486 

10.500 

47.486 

!  > 

TI  #1  - 

VOT  tl 

-14.514 

19.250 

53.014 

«  - 

INT  #1 

-  ITT  t2 

-50.065 

-27.000 

-3.935  *** 

» 

• 

INT  #1 

-  ITT  tO 

-50.065 

-27.000 

-3.935  *** 

r  *>; 

INT  #1 

-  ITT  tl 

-48.565 

-25.500 

-2.435  *** 

!  £ 

INT  #1 

-  VER  t2 

-38.065 

-15.000 

8.065 

INT  #1 

-  VER  tO 

-35.815 

-12.750 

10.315 

INT  #1 

-  INT  t2 

-34.565 

-11.300 

11.565 

■ 

INT  #1 

-  INT  tO 

-27.065 

-4.000 

19.065 

* 

INT  #1 

-  TI  tO 

-30.568 

-3.000 

24.568 

INT  #1 

-  TI  tl 

-35.871 

-1.000 

33.871 

* 

INT  #1 

-  VOT  tO 

-15.565 

7.500 

30.565 

INT  #1 

-  VOT  t2 

-14.065 

9.000 

32.065 

INT  #1 

-  VER  tl 

-14.065 

9.000 

32.065 

INT  #1 

-  TI  t2 

-18.068 

9.500 

37.068 

; 

INT  #1 

-  VOT  tl 

-4.815 

18 . 250 

41 . 315 

VOT  #0 

-  ITT  t2 

-55.854 

-34 . 500 

-13.146  *** 

>  %: 

VOT  #0 

-  ITT  tO 

-55 . 854 

-34 . 500 

-13 . 146  *** 

i 

VOT  tO 

-  ITT  tl 

-54.354 

-33.000 

-11.646  *  *  * 

.. 

VOT  tO 

-  VER  t2 

-43.854 

-22.500 

-1.146  *  *  * 

t  >; 

VOT  tO 

-  VER  tO 

-41 . 604 

-20.250 

1  .  104 

► 

t 

VOT  tO 

-  INT  1 2 

-40 . 354 

-19.000 

2.354 

VOT  tO 

-  INT  tO 

-32.854 

-11 . 500 

9.854 

'  vC 

VOT  tO 

-  TI  tO 

-36 . 653 

-10 . 500 

15.653 

tV 

1  1 

VOT  tO 

-  TI  tl 

-42 . 264 

-8 . 500 

25.264 

VOT  tO 

-  INT  tl 

-30.565 

-7.500 

15.565 

f 

t  m\ 

VOT  tO 

-  VOT  t2 

-19 . 854 

1 . 500 

22 . 854 

;  S' 

VOT  tO 

-  VER  tl 

-19.854 

1 . 500 

22 . 854 

( 

VOT  tO 

-  TI  t2 

-24.153 

2 . 000 

28.153 

* 

!  * 

VOT  tO 

-  VOT  tl 

-10.604 

10.750 

32 . 104 

1  - 

t 

\  -• 

v  •/ 

\  >«• 

* 

i 

103 

S  .-* 

E  '• 

l&SSS 

r.  a 

t 


SIMULTANEOUS 

SIMULTANEOUS 

( 

RECOGNIZER 

LOWER 

DIFFERENCES 

UPPER 

ft 

*  NOISE 

CONFIDENCE 

BETWEEN 

CONFIDENCE 

COMPARISON 

LIMIT 

MEANS 

LIMIT 

3 

VOT  #2  - 

ITT  #2 

-57.354 

-36.000 

-14 . 646  *** 

VOT  #2  - 

ITT  #0 

-57.354 

-36.000 

- 14 . 646  *  *  * 

VOT  #2  - 

ITT  #1 

-55.854 

-34.500 

-13 . 146  *** 

ft 

VOT  #2  - 

VER  #2 

-45.354 

-24.000 

-2.646  +** 

a' 

VOT  #2  - 

VER  #0 

-43.104 

-21.750 

-0.396  *** 

VOT  #2  - 

INT  #2 

-41.854 

-20.500 

0.854 

Uc 

VOT  #2  - 

INT  #0 

-34.354 

-13.000 

8.354 

VOT  #2  - 

TI  #0 

-38.153 

-12.000 

14.153 

VOT  *2  - 

TI  #1 

-43.764 

-10.000 

23.764 

VOT  #2  - 

INT  #1 

-32.065 

-9.000 

14.065 

VOT  #2  - 

VOT  #0 

-22.854 

-1.500 

19.854 

VOT  #2  - 

VER  #1 

-21.354 

0.000 

21.354 

A' 

VOT  #2  - 

TI  #2 

-25.653 

0.500 

26.653 

VOT  #2  - 

VOT  #1 

-12.104 

9.2.50 

30.604 

VER  #1  - 

ITT  #2 

-57.354 

-36.000 

- 14 . 646  *  *  * 

S 

VER  #1  - 

ITT  #0 

-57.354 

-36.000 

-14.646  *** 

.  ^ 

"V 

VER  #1  - 

ITT  #1 

-55.854 

-34.500 

-13. 146  *  *  * 

VER  #1  - 

VER  #2 

-45.354 

-24.000 

-2 . 646  *  *  * 

r1 

VER  #1  - 

VER  #0 

-43.104 

-21.750 

-0 . 396  *** 

l 

VER  #1  - 

INT  #2 

-41.854 

-20.500 

0.854 

VER  #1  - 

INT  #0 

-34.354 

-13.000 

8.354 

V 

VER  #1  - 

TI  #0 

-38.153 

-12.000 

14.153 

» 

A 

VER  *1  - 

TI  #1 

-43.764 

-10.000 

23.764 

.N 

VER  #1  - 

INT  #1 

-32.065 

-9.000 

14.065 

VER  #1  - 

VOT  #0 

-22.854 

-1.500 

19.854 

! 

>r 

VER  #1  - 

VOT  #2 

-21.354 

0 . 000 

21.354 

/■ 

VER  #1  - 

TI  #2 

-25.653 

0.500 

26.653 

j»* 

VER  #1  - 

VOT  #1 

-12.104 

9.250 

30 . 604 

j? 

TI  #2  - 

ITT  #2 

-62.653 

-36 . 500 

-10. 347  *  *  * 

TI  #2  - 

ITT  #0 

-62.653 

-36 . 500 

-10.347  *** 

yj 

TI  #2  - 

ITT  #1 

-61 .153 

-35.000 

-8.847  *** 

w\ 

L* 

TI  #2  - 

VER  #2 

-50.653 

-24.500 

1.653 

TI  #2  - 

VER  #0 

-48 . 403 

-22.250 

3 . 903 

TI  #2  - 

INT  #2 

-47.153 

-21.000 

5 .153 

;< 

TI  #2  - 

INT  #0 

-39.653 

-13.500 

12 . 653 

#* 

TI  # 2  - 

TI  #0 

-42.699 

-12 . 500 

17.699 

TI  *2  - 

TI  #1 

-47.486 

-10.500 

26 .486 

V 

TI  *2  - 

INT  #1 

-37.068 

-9 . 500 

18 . 068 

,v 

TI  #2  - 

VOT  #0 

-28 . 153 

-2 . 000 

24.153 

TI  #2  - 

VOT  #  2 

-26.653 

-0 . 500 

25.653 

TI  #2  - 

VER  #1 

-26.653 

-0 . 500 

25.653 

•  n 

TI  #2  - 

VOT  #1 

-17 . 403 

8.750 

34 . 903 

5 
/• ' 

104 


•V 


RECOGNIZER 
*  NOISE 
COMPARISON 


DIFFERENCES 

BETWEEN 

MEANS 


SIMULTANEOUS 

LOWER 

CONFIDENCE 

LIMIT 


VOT 

#1 

- 

ITT 

#2 

-66.604 

VOT 

#1 

- 

ITT 

#0 

-66.604 

VOT 

#1 

- 

ITT 

#1 

-65.104 

VOT 

*1 

- 

VER 

#2 

-54.604 

VOT 

#1 

- 

VER 

#0 

-52.354 

VOT 

#1 

- 

I  NT 

#2 

-51 . 104 

VOT 

#1 

- 

I  NT 

#0 

-43.604 

VOT 

#1 

- 

TI 

#0 

-47.403 

VOT 

#1 

- 

TI 

#1 

-53.014 

VOT 

#1 

- 

I  NT 

#1 

-41.315 

VOT 

#1 

- 

VOT 

#0 

-32.104 

VOT 

#1 

- 

VOT 

*2 

-30.604 

VOT 

#1 

- 

VER 

#1 

-30.604 

VOT 

#1 

- 

TI 

#2 

-34.903 

SIMULTANEOUS 

UPPER 

CONFIDENCE 

LIMIT 


-45.250 

-23.896 

*  *  * 

-45.250 

-23.896 

*  *  * 

-43.750 

-22.396 

*  *  * 

-33.250 

-11.896 

*  *  * 

-31.000 

-9.646 

*  *  * 

-29.750 

-8.396 

★  *  * 

-22.250 

-0.896 

*  ★  ★ 

-21.250 

4.903 

-19.250 

14.514 

-18.250 

4.815 

-10.750 

10.604 

-9.250 

12.104 

-9.250 

12.104 

-8.750 

17.403 

105 


Appendix  XV.  Re jection/Misrecognition  Matricies:  Scenario  2 


R 

E  RUBBER  OF  WORDS  REJECTED 

C  (Across  all  Speakers) 

G  SESTERCE  LENGTH 

M  N 

1  0  3  HOED  4  WORD  5  WORD 

2  1  T  T  T 

E  S  0  0  0 

RE  T  T  T 

A  A  A 

1  2  3  4  5  6  7  8  9  10  11  12  L  IT  13  14  15  16  17  18  L  JT  19  20  21  22  23  24  L  l  TOT  REJ. 


V 

to 

0  0 

0  0 

0 

0 

0 

0 

T 

tl 

0  0 

0  0 

0 

0 

0 

A 

N 

t2 

0  0 

1 

1  .01 

0 

0 

0 

to 

1 

1 

1 

1 

2 

1 

3 

10  .10 

2 

2  .04 

1 

2 

1 

4 

.08 

16 

T 

I 

11 

1 

3 

2 

2 

2 

10  .21 

0  0 

0 

0 

10 

t2 

4 

1 

6 

6 

1 

4 

4 

26  .27 

1 

5 

3 

1 

10  .21 

1 

2 

1 

1  5 

.10 

41 

I 

N 

to 

4 

4 

4 

4 

4 

1 

3 

1 

7 

3 

2 

37  .19 

2 

7 

1 

2 

4 

4 

20  .21 

13 

1 

7 

1  22 

.23 

79 

tl 

2 

1 

3 

5 

3 

1 

4 

4 

23  .16 

3 

2 

1 

2 

5 

13  .18 

7 

1 

2 

2 

1 

1  14 

.19 

50 

T 

t2 

3 

2  1 

1 

1 

2 

10  .05 

3 

2 

1 

4 

3 

13  .14 

6 

1 

2 

4 

13 

.14 

36 

to 

L 

1 

1 

3 

3 

4 

13  .07 

1 

1 

6 

1 

9  .09 

9 

4 

4 

17 

.18 

39 

V 

E 

tl 

4 

4  4 

5 

9 

7 

5 

2 

1 

4 

3 

48  .25 

5 

5 

1 

4 

6 

5 

26  .27 

11 

4 

3  11 

1 

30 

.31 

104 

R 

t2 

1 

1 

1 

1 

4  .02 

2 

1 

1 

4  .04 

5 

2 

2 

1  10 

.10 

ie 

I 

T 

to 

0  0 

0  0 

0 

0 

0 

tl 

0  0 

0  0 

0 

0 

0 

T 

t2 

0  0 

0  0 

0 

0 

0 

«0  RO  NOISE 
11  INDUSTRIAL 
#2  FAST  FOOD  RESTAURANT 


( 


8 

E  NUMBER  OF  WORDS  MISRECOGNIZED 

C  (Across  all  Speakers) 

0 

G  SENTENCE  LENGTH 


N 

N 

3  WORD 

4  WORD 

5  WORD 

I 

0 

T 

T 

T 

Z 

I 

0 

0 

0 

E 

s 

T 

T 

T 

R 

E 

A 

A 

A 

1 

2 

3 

4 

S 

6  7 

8 

9  10  11  12 

l  r 

13  14  15  16  17  18 

L 

IT 

19  20  21  22  23  24 

L 

t 

TOT  NISRECOG 

» 

A 

10 

7 

3 

9 

1 

2 

1  9 

7 

5 

5 

5 

54  .28 

1 

6 

9  12  11 

4 

43 

.45  12 

7 

2  14 

4 

2 

41 

.43 

138 

yj 

T 

1 

11 

9 

2 

5 

3 

7 

5  10 

8 

1  7 

7 

6 

70  .36 

7 

6 

8  12  13 

5 

51 

.53  15 

8 

5  15 

8 

8 

59 

.61 

180 

A 

N 

12 

11 

2 

7 

2 

3 

6  9 

6 

1  6 

9 

4 

66  .34 

2 

5 

3  11  10 

5 

36 

.38 

9 

8 

2  12 

7 

2 

40 

.42 

131 

to 

1 

2 

1 

4  .04 

1 

3 

1 

5 

.10 

2 

2 

1 

1 

4 

10 

.21 

18 

T 

I 

11 

2 

2 

4  .08 

1 

3 

4 

.17 

2 

3 

2 

1 

8 

.33 

16 

12 

1 

2 

3 

2 

1 

2 

2 

13  .14 

1 

1 

3 

5 

.10 

3 

4 

1 

5 

1 

14 

.30 

32 

I 

ii 

to 

2 

2  .01 

1 

3 

1 

5 

.05 

5 

5 

.05 

12 

(1 

T 

r 

tl 

1 

5 

5 

11  .08 

1 

1 

2 

4 

2 

10 

.14 

3 

1 

4 

8 

.11 

29 

L 

R 

12 

1 

1 

3 

5  .03 

1 

1 

3 

4 

9 

.09 

5 

4 

9 

.09 

23 

V 

F 

to 

1 

1  .01 

6 

1 

7 

.07 

4 

6 

10 

.10 

18 

b 

R 

tl 

2 

3 

1 

1 

4 

1 

2 

1 

1 

2 

18  .09 

1 

2 

5 

3 

11 

.11 

1 

3 

1 

5 

.05 

34 

B 

E 

X 

t2 

3 

2 

1 

1 

1 

8  .04 

4 

2 

1 

7 

.07 

4 

3 

8 

15 

.16 

30 

I 

T 

? 

to 

0  .0 

0 

.0 

0 

.0 

0 

tl 

1 

1 

1 

3  .02 

1 

1 

.01 

1 

1 

2 

.02 

6 

1 

12 

0  .0 

0 

.0 

0 

.0 

0 

10  NO  NOISE 

11  INDUSTRIAL 

12  FAST  FOOD  RESTAURANT 


107 


,<■,  <",  /,  ■r.  - 


•*.  •*-  •*,  • 


Appendix  XVI.  Misrecogni tion  Error  Trees:  Scenario  2 


*•»,  rp.s  to  vz  j 


Appendix  XVII.  Performance  Means:  Scenario  2 


Scenario  2 


CORRECT  SENTENCES 


TOTAL 

CORRECT 


SUM 

MEAN 

TOTAL 

UTTERANCES 

RECOGNIZER 

NOISE 

VOT 

0 

246.00 

61.50 

.64 

1 

203.00 

50.75 

.53 

2 

240.00 

60.00 

.63 

TI 

0 

144.00 

72.00 

.75 

1 

70.00 

70.00 

.73 

2 

119.00 

59.50 

.62 

INT 

0 

292.00 

73.00 

.76 

1 

207.00 

69.00 

.72 

2 

322.00 

80.50 

.84 

VER 

0 

327.00 

81.75 

.85 

1 

240.00 

60.00 

.63 

2 

336.00 

84.00 

.  88 

ITT 

0 

384.00 

96.00 

1 . 00 

1 

378.00 

94.50 

.  98 

2 

384.00 

96.00 

1.00 

ALL 

3892.00 

74.85 

.78 

Scenerio  2 


SENTENCES  REJECTED 


TOTAL 

REJECTED 


SUM 

MEAN 

TOTAL 

UTTERANCES 

RECOGNIZER 

NOISE 

VOT 

0 

0.00 

0.00 

.  00 

1 

0.00 

0.00 

.00 

2 

1 . 00 

0.25 

.  003 

T I 

0 

16 . 00 

8.00 

.08 

1 

10.00 

10.00 

.  10 

2 

41.00 

20 . 50 

.  21 

INT 

0 

80.00 

20 . 00 

.21 

1 

51 . 00 

17.00 

.  18 

2 

36.00 

9.00 

.09 

VER 

0 

39.00 

9.75 

.  10 

1 

109.00 

27.25 

.28 

2 

18 . 00 

4.50 

.  05 

ITT 

0 

0 . 00 

0.00 

.  00 

1 

0 . 00 

0 . 00 

.00 

2 

0.00 

0 . 00 

.  00 

ALL 

401 . 00 

7.71 

Scenario  2 


SENTENCES  MISRECOGNIZED 


TOTAL 

CORRECT 


TOTAL 


SUM 

MEAN 

UTTERANCES 

RECOGNIZER 

NOISE 

VOT 

0 

138 . 00 

34 . 50 

.  36 

1 

131 . 00 

45 . 25 

.47 

2 

143 . 00 

35.75 

.  37 

T I 

0 

32 . 00 

16.00 

.  17 

1 

16 . 00 

16 . 00 

.  17 

2 

32 . 00 

16 . 00 

.  17 

INT 

0 

12 . 00 

3.00 

.03 

1 

30.00 

10 . 00 

.  10 

2 

26 . 00 

6.50 

.  07 

VER 

0 

18.00 

4.50 

.05 

1 

35.00 

8.75 

.09 

2 

30.00 

7 . 50 

.  08 

ITT 

0 

0.00 

0 . 00 

.  00 

1 

6 . 00 

1  .  50 

.  02 

2 

0 . 00 

0  .  CO 

.  00 

ALL 

699 . 00 

13.44 

\  «• 


L 


5  y, 


TUKEY'S  STUDENTIZED  RANGE  (HSD)  TEST  FOR  VARIABLE: 
ALPHA=0 . 05  CONFIDENCE=0 . 95  DF=24  MSE-64.4803 
CRITICAL  VALUE  OF  STUDENTIZED  RANGE=4.166 


CORRECT 


COMPARISONS  SIGNIFICANT  AT  THE  0.05  LEVEL  ARE  INDICATED  BY 


RECOGNIZER 

COMPARISON 


SIMULTANEOUS 

LOWER 

CONFIDENCE 

LIMIT 


DIFFERENCES 

BETWEEN 

MEANS 


10 . 592 
10.989 
16.307 
28.425 


20 . 250 
20.864 
28.900 
38 . 083 


SIMULTANEOUS 

UPPER 

CONFIDENCE 

LIMIT 


29.908  *** 

30.739  *  *  * 

41.493  *** 

47.741  *** 


p  a 

VER 

ITT 

-29.908 

-20.250 

-10 . 592 

VER 

INT 

-9.261 

0.614 

10.489 

VER 

TI 

-3.943 

8.650 

21.243 

1  1 

VER 

VOT 

8.175 

17.833 

27.491 

INT 

ITT 

-30.739 

-20.864 

-10.989 

I  NT 

VER 

-10 . 489 

-0.614 

9 . 261 

INT 

TI 

-4.723 

8.036 

20.796 

1  K- 

INT 

VOT 

7 . 345 

17.220 

27 . 095 

m 

TI 

ITT 

-41 . 493 

-28.900 

-16 . 307 

* 

• 

T I 

INT 

-21.243 

-8.650 

3.943 

• 

•  V 

TI 

INT 

-20.796 

-8.036 

4.723 

r.  v 

::  > 

Tl 

VOT 

-3 . 409 

9.183 

21 .776 

\ 

VOT 

ITT 

-47.741 

-38 . 083 

-28 .425 

r  ^  * 

VOT 

VER 

-27.491 

-17.833 

-9. 175 

K  ^ 

VOT 

INT 

-27.095 

-17.220 

-7 . 345 

l  V 

VOT 

TI 

-21 .776 

-9 . 183 

3 . 409 

hi 

L 


■f. 


9 


TUKEY'S  STUDENTI2ED  RANGE  (HSD)  TEST  FOR  VARIABLE: 
ALPHA  =  0 . 05  CONFIDENCES. 95  DF  =  24  MSE-64.4803 
CRITICAL  VALUE  OF  STUDENTIZED  RANGE=3.S32 


CORRECT 


COMPARISONS  SIGNIFICANT  AT  THE  0.05  LEVEL  ARE  INDICATED  BY 


SIMULTANEOUS 

SIMULTANEOUS 

■,* 

LOWER 

DIFFERENCES 

UPPER 

d. 

NOISE 

CONFIDENCE 

BETWEEN 

CONFIDENCE 

COMPARISON 

LIMIT 

MEANS 

LIMIT 

SJ 

a 

2 

-  0 

-6 . 240 

0.444 

7.129 

2 

-  1 

2.318 

9.208 

16 . 098  *  *  * 

L 

0 

-  2 

-7 . 129 

-0.444 

6 . 240 

A  ‘ 

0 

-  1 

1.874 

8.764 

15.654  *  *  * 

& 

1 

-  2 

-16.098 

-9.208 

-2.318  *  *  * 

1 

-  0 

-15.654 

-8.654 

-1.874  *** 

I 


V* 


0  *  NO  NOISE 

1  =  INDUSTRIAL  NOISE 

2  =  FAST-FOOD  RESTAURANT  NOISE 


W. 


* 


/. 


.v 

.V 


117 


Appendix  XIX.  Re jection/Misrecognition  Matricies:  Scenario  3 


WISE  1  2  3  4  5  6  7  8  9  10  11  12  13  14  IS  16  17  18  19  20  21  22  23  24  25  L 


V  WNE  14  4312  34  3233  2  37  3253  58 

E 

R  INDUSTRIAL  2  7  1  5  3  7  1  7  3  3  4  3  6  4  2  5  5  3  4  4  6  4  5  5  99 

B 

E  FAST  FOOD 

X  RESTAURANT  1  4  1  4  2  2  2  2  2  1  1  3  5  1  31 


3  7  3  2  5  3  58 


I  NONE  7  1  1  9  3  8  2  5  8  3  4  3  2  7  4  4  5  3  9  4  7  3  6  10  118 

N 


R  INDUSTRIAL  3  6  1  4  7  8  2  5  6 

S 


2  1  6  1  6  5  1  8  3  5  3  3  7  101 


T  FAST  FOOD 

E  RESTAURANT  1  3  3 


l  3  1  1 


11  l  1  1  1  23 


I  WRE 

T  INDUSTRIAL 
T  FAST  FOOD 
RESTAURANT 


Rejection*  -  Connected  Speech  Scenario  3 


VERBEX 

INTERSTATE 

ITT 

0  1 

2 

TOTAL 

,  X 

0  1 

2 

TOTAL  X 

0 

1 

2 

TOTAL 

X 

TOTAL 

1  DIGIT 

12  18  12 

42 

.23 

18  21 

T 

43  .24 

0 

T 

0 

0 

0 

85 

2  DIGIT 

10  18 

6 

34 

.19 

26  21 

4 

51  .28 

0 

1 

0 

1 

.01 

86 

3  DIGIT 

11  20 

7 

38 

.21 

19  17 

5 

41  .23 

0 

0 

0 

0 

0 

79 

4  DIGIT 

12  19 

5 

36 

.20 

25  21 

6 

52  .29 

0 

1 

0 

1 

.01 

89 

5  DIGIT 

13  24 

1 

38 

.21 

30  21 

4 

55  .21 

0 

1 

0 

1 

.01 

94 

SENT.  1 

2  11 

T 

14 

.08 

14  7 

T 

21  .12 

0 

T 

0 

0 

0 

35 

SENT.  2 

14  29 

8 

51 

.28 

30  33  12 

75  .42 

0 

i 

0 

1 

.01 

127 

SENT.  3 

4  14 

2 

20 

.11 

14  13 

2 

29  .16 

0 

2 

0 

2 

.02 

51 

SENT.  4 

18  26 

7 

51 

.28 

17  14 

2 

33  .18 

0 

0 

0 

0 

0 

84 

SENT.  5 

20  19  13 

52 

.29 

43  34 

7 

84  .47 

0 

0 

0 

0 

0 

136 

Rejection  by  Sentence  -  Scenario  3 


120 


wnrnmvniiyHvmKvvTvnwi 


UlMMUinU 


nmnvcnwnv  nv  nnvnrani 


VEBBEX  INTERSTATE  ITT 


0 

1 

2  TOTAL 

X 

0 

1 

2 

TOTAL 

X 

0 

1 

2 

TOTAL 

X 

TOTAL 

1  DIGIT 

2 

T 

0 

3 

.02 

0 

T 

T 

5 

.03 

T 

T 

5 

13 

.07 

21 

2  DIGIT 

3 

10 

3 

16 

.09 

0 

2 

4 

6 

.03 

7 

9 

9 

2S 

.14 

47 

3  DIGIT 

4 

2 

6 

12 

.07 

2 

4 

5 

11 

.06 

5 

10 

9 

24 

.13 

47 

4  DIGIT 

2 

5 

4 

11 

.06 

0 

4 

8 

12 

.07 

10 

10 

9 

29 

.16 

52 

5  DIGIT 

0 

1 

1 

2 

.01 

0 

0 

6 

6 

.03 

5 

8 

7 

20 

.11 

28 

SENT.  1 

2 

T 

T 

13 

.07 

0 

T 

T 

7 

.04 

T 

T 

7 

16 

.09 

36 

SENT.  2 

4 

3 

2 

9 

.05 

1 

i 

8 

10 

.06 

4  18 

14 

36 

.20 

55 

SENT.  3 

0 

4 

1 

5 

.03 

1 

3 

1 

5 

.03 

7 

7 

9 

23 

.13 

33 

SENT.  4 

5 

4 

6 

IS 

.08 

0 

5 

S 

10 

.06 

5 

4 

6 

15 

.08 

40 

SENT.  5 

0 

2 

0 

2 

.01 

0 

0 

8 

8 

.04 

10 

8 

3 

21 

.12 

31 

Hisrecoqnition  by  Sentence  -  Scenario  3 


Vt 

L 


i 

I 


i 

k 


'.V 


T 

0 

T 

A 

BOISE  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  L 


V  RONE  11 

E 

1  2 

4 

1  1 

11 

R  INDUSTRIAL  1 

6  12  1 

1 

1 

1112 

1 

19 

6 

E  FAST  FOOD 

X  RESTAURANT 

3 

1  1 

4 

1  1  2 

1 

14 

I  HONE  11  2 

N 
T 
E 

8  INDUSTRIAL  1112  1211  3  13 

S 
T 
A 

T  FAST  FOOD 

E  RESTAURANT  2  2  1  1  3  1  1  1  2  3  2  3  1  25 


I  ROBE  3  2  1  4  1  4  1  2  5  2  1  3  1  30 

T  INDUSTRIAL  4  1  1  4  2  2  6  1  3  2  1  4  3  2  4  2  42 

T  FAST  FOOD 

RESTAURANT  2  3  2  3  4  2  6  1  1  1  2  2  3  2  2  3  39 


Niarecognitiona  •  Connected  Speech  Scenario  3 


Vi 


122 


Omissions 


0.67 


Insertions 

2.00 


C  Substitutions  *)< 
5.33 


C  Omissions 


r- 


Insertions 


:■ 


2.00 


[  Substitutions 

3 

3.00 

k  Omissions 

~y 

Omissions 


Insertions 

4.67 


Substitutions 

8.33 


J 

> 


Omissions 


S>N  ,N  kVS  .*. /«  A  k\A.V  SA  %  %S;V '.  V  4.  V  S  \  V  %  N  %*  %  •  *  .*  .  *  .  "  .  •**,-■««  -  ^  »•«*«-.  •  .  »  .  ,  .*  • 

—  —  -  -  -  -  -  /.a  ^  /  w  /  /  /  .•  .•  -•  .*  -■  ^  /  >  V  V  .*  ’»•  *,•  ,*  '.•  V  v  *  •'  • 


